SIP Forum H. Kaplan SFSIW-1 Paper Acme Packet Intended status: Informational November 28, 2007 A Brief List of Common SIP Interoperability Issues draft-kaplan-sip-interop-issues-00 Copyright Notice Copyright (C) Hadriel Kaplan (2007). Abstract This document identifies several commonly found interoperability issues with SIP, for the purpose of stimulating discussion at the first SIP Forum SIP Interoperability Workshop. Table of Contents 1. Introduction................................................1 2. Applicability...............................................2 3. General Interoperability Issues.............................2 3.1. Configurable settings..................................2 3.2. Legacy RFCs and expired drafts usage...................2 3.3. Response code issues...................................3 3.4. SIP field lengths......................................4 3.5. SIP and TEL URI formats................................4 4. Specific Interoperability Issues............................5 4.1. Offer-less Invites and Re-Invites......................5 4.2. REGISTER response behavior.............................5 4.3. DTMF Exchange methods..................................6 4.4. IETF vs. 3GPP uses of Service-Route header.............6 4.5. Competing NAT traversal techniques.....................6 4.6. Call-hold signaling....................................7 4.7. Early and on-hold media................................7 5. References..................................................8 Author's Address..................................................9 1. Introduction SIP has grown both in terms of vendor/customer adoption and protocol complexity, with numerous implementations, and differing assumptions, leading to numerous interoperability issues. Unlike some other protocols, it suffers from a lack of either a single dominant vendor, or of a single autocratic standards body. The large number of vendors involved, from different regions of the World, and the differences in needs and wants of the customers of Kaplan Expires May 1, 2008 [Page 1] SIP Interoperability Issues November 2007 those vendors, has led to a complicated interoperability problem space. This paper is a brief list of some of the more common interoperability issues my company has encountered in recent time. It is not an exhaustive list by any means, and while SBC products try to "fix" these issues, there may be better ways of addressing them in the long term, so they don't need to be "repaired". 2. Applicability This draft is focused on SIP interop issues only. Although interop issues exist for SDP, MIME, RTP, RTCP, etc., they are out of scope for this document. 3. General Interoperability Issues 3.1. Configurable settings One of the most difficult challenges with interworking SIP devices is the fact that so much of the protocol machinery and extension usage is provisioned, vs. dynamically learned. Most people think of this in the context of endpoint configuration, but there is significant provisioning performed on proxies, app servers, and other middle-boxes - to the extent that one cannot simply say "device X interoperates with device Y" without specifying the configuration of each device at each end, and those in between. This is not a new problem - HTTP and other protocols have similar issues - but the number of hops a SIP message traverses, variety of device implementations and available extensions is so much greater for SIP than perhaps any protocol before, at such an early stage of the protocol's life, that it endangers the adoption of SIP itself. I am not sure this is really addressable, short of defining very specific profiles a la the SIP Forum SIP-Connect spec. [Note that the attempts at doing so thus far have not been completely plug-and- play, in my opinion, because they still define/allow optional behavior] 3.2. Legacy RFCs and expired drafts usage There are numerous cases where legacy (obsoleted) RFCs and expired drafts end up in deployed systems and present interoperability problems for newer systems. While it is doubtful this is a fixable problem, one wonders if, for the most common cases, it would not be beneficial to document them so that newer systems are designed to expect the legacy behavior as well as the new. Ignoring legacy Kaplan Expires - May 2007 [Page 2] SIP Interoperability Issues November 2007 usage has not seemed to succeed so far in general. The problem is that systems are deployed, and new vendors trying to sell products need to make their systems work with the legacy ones, not the other way around. For example, the Diversion header outlined in [draft-diversion] is far more prevalent than [RFC4244] History-Info, it seems. The problem with this type of legacy usage is one cannot merely support receiving the older syntax, because one needs to figure out what to generate as well, as a request gets forwarded (e.g., convert Diversion to History-Info, or add Diversions, etc.). 3.3. Response code issues There are numerous reasons a given response code may be sent, and in some cases more than one response code may be appropriate, which has led to differing expectations and behaviors. The need to resolve such conflicts between domains of proxies has led to middle-boxes changing the response codes, which may well exacerbate the problem in the future. In general the interoperability problems that arise are where the upstream proxies or UAC perform automatic re-attempts to alternate paths for certain response codes but not others, and such action cannot be known in advance to the downstream device. For example a 404 Not Found is commonly returned by a proxy when it cannot find a route to the target for any number of reasons, and this response causes some upstream nodes to try alternate paths and some not to. Because a 404 can be returned for a variety of reasons, some of which should cause a re-route and some not, some vendors send different response codes than 404 for those conditions: response codes which are more explicit about whether a re-route is the appropriate action. Another example is 503, which seems to cover everything from temporary overload conditions, administrative-down state, permanent failure, and as a catch-all for anything not easily identified by other codes. Some devices treat this response code as a semi- permanent condition for the next-hop, and avoid sending any subsequent requests to the next-hop for a sustained period of time, which may or may not be the correct action to take. Unfortunately the upstream nodes have no idea which downstream proxy actually generated the 503. [See "REGISTER response behavior" section for related problems] Kaplan Expires - May 2007 [Page 3] SIP Interoperability Issues November 2007 3.4. SIP field lengths While the RFCs do not define any maximum lengths for SIP header fields (values, parameters, etc.), the reality of computing technology is such that vendors often do feel compelled to impose maximum lengths for received fields. Whether it's due to security concerns, product architecture, logging constraints, or whatever, the fact is there are many systems which cannot or will not handle fields as large as other systems can generate. Although [RFC3261] does define some specific response codes (413/414/513) for this case, it does not fix the underlying interop issue. Devices cannot simply stop sending larger fields based on a SIP response code. This issue has been appearing more frequently lately, with the use of embedded cookies in URIs and parameter growth. I believe a BCP may help here, if it can define some recommended minimum blob lengths that any SIP device should be able to accept, for defined and unknown blobs. Customers can then demand their vendors to comply with the BCP. 3.5. SIP and TEL URI formats Despite all RFC wording to the contrary, the SIP URI format has seen widespread use as essentially the semantic equivalent of the TEL URI, albeit with different syntax. Many provider systems treat sip:16035551212@example.net as logically equivalent to tel:+16035551212, even though the former has local scope to example.net only, and the latter has global scope. Part of the reason for this, I believe, is that originating UA's have no real way of knowing when a URI should be one or the other - the user pressed digit buttons and hit "send", and all the UAC can do is send the request to sip:[digits]@[local-domain]. It doesn't know the numbers pressed were global in scope, or even E.164 numbers. Only the routing proxies know this, and even then they only know the numbers they're each responsible for. Thus we see the domain portion of SIP URIs getting replaced by middle-boxes at provider boundaries, if the username portion looks like an E.164 number. Furthermore, many systems have either been designed or provisioned to handle only one scheme type (i.e., SIP URIs). This has led to cases where requests are rejected unless the appropriate URI scheme is used, and frequently that single common scheme needs to be used in more than just the request-URI (e.g., To and From URI's as well). This wholesale replacement of schemes and domain names in URIs leads to interop issues when the same URIs are expected to be used for end-to-end purposes, in headers or XML bodies the middle-boxes do not or cannot change. The most recent example is [RFC4474] sip- identity. Kaplan Expires - May 2007 [Page 4] SIP Interoperability Issues November 2007 4. Specific Interoperability Issues 4.1. Offer-less Invites and Re-Invites Although this is clearly a device implementation issue (i.e., a "bug"), we have seen numerous devices from different vendors have trouble handling Invites or re-Invites without SDP. For initial Invites without SDP, often the root cause for failure is that specific request routing or admission decision logic of intermediate devices depends on the SDP; for example devices which route calls based on codec, or bandwidth allocation devices, or 3PCC transcoding devices which themselves send out offer-less Invites but didn't expect to receive such. (Apparently they never considered that a call could cross two such systems!) For re-Invites, the delayed SDP offer model is performed for very specific use cases which are common, but were simply not envisioned by the developers of the UA's. The only recommendation that this paper puts forth is that the offer-less Invite usage be specifically documented in a separate RFC or BCP, in the hopes that vendors will be aware of their usage and customers ask for compliance to the RFC explicitly. Don't hide this in a generic "Invite call flows BCP". 4.2. REGISTER response behavior Another form of the interop problems that arise from responses is the behavior of UA's with regard to Registration and Subscribe response handling. For example, only a minority of UA's properly support 3xx redirects for REGISTER, even though it would be a useful mechanism for load-balancing. For REGISTER requests specifically, it would be beneficial if there was explicit documentation of what actions should be performed by the UAC. To reinforce this point, consider that UA's perform Registrations and Subscriptions in a fairly automatic fashion with little user interaction, and so the way in which they treat specific response codes can have dramatic consequences. For example, it is not well- defined what a UA should do when its REGISTER is rejected with a 404, or even 503, and hardly any UA's honor the Retry-After header. A very few UA's will give up altogether and wait for user input; some UA's will wait a few minutes and try again, indefinitely; some will re-attempt their Registration almost immediately, even faster, and never give up. This creates numerous problems in large network deployments, and has led SBC vendors to implement various protection schemes - from dynamic hardware ACLs, to even sending a 200 ok just to shut the UA up. Kaplan Expires - May 2007 [Page 5] SIP Interoperability Issues November 2007 4.3. DTMF Exchange methods One of the most basic expectations of functionality that consumers expect from "phone calls" is DTMF, yet this still has interoperability issues in the real world. [RFC2833] defines how to perform DTMF notification in the media-plane, while [RFC4730] KPML defines how to perform such in the signaling-plane. Unfortunately, the most common signaling-plane mechanism we have found is exchanged in INFO messages - but it is not documented in an RFC and lacks the ability to perform negotiation of support. (Whether adding such support will succeed remains to be seen) 4.4. IETF vs. 3GPP uses of Service-Route header While there are undoubtedly entire classes of interop problems associated with the IETF vs. 3GPP/TISPAN models, only one is mentioned here: [RFC3608] Service-Route. The [RFC3608] mechanism defined by the IETF leads an IETF-compliant UA to route requests based on the received Service-Route header, but in 3GPP the Service- Route header does not include the P-CSCF first-hop proxy that must actually be traversed. [note: and what's more, for IMS-AKA, the P- CSCF's port changes after Registration, so the Service-Route would be wrong if it did include the P-CSCF] Furthermore, the [RFC3608] states that the Service-Route applies to the entire Address-of-Record, which implies the same one for all contacts of that AoR. In specific load-balancing and visited network scenarios, however, two registering contacts for the same AoR may traverse two different sets of outbound Proxies or even Registrars and need different Service-Routes per contact. Some Registrars, Proxies, and/or UA's comply with the RFC verbatim and essentially break the path of one of the contacts. 4.5. Competing NAT traversal techniques Until the mechanism in [sip-outbound] achieves widespread deployment, vendors employ multiple techniques for NAT traversal of SIP signaling, which can lead to interoperability problems. Many server-side vendors employ a REGISTER refresh approach, whereby the UA is told a short REGISTER expires time in order to keep the NAT pinhole open; other client-side vendors attempt to auto-discover a NAT exists, by looking at the Via received parameter in responses, or assuming a local rfC1918 address means the UA is behind a NAT, or by having user-settable check-boxes, and send either OPTIONS or CRLF or proprietary Methods to keep the pinhole open. Unfortunately when the client and server-side implementations don't agree, it sometimes leads to unexpected consequences, such as attack Kaplan Expires - May 2007 [Page 6] SIP Interoperability Issues November 2007 detection and dynamic blacklisting on the server side, or for media NAT traversal to fail (e.g., if the server-side does not believe the UA is behind a NAT because it fixes itself for signaling, but not media). In response to this server-side vendors have created counter-measures to make the UA not detect it's behind a NAT, when it really is. Hopefully [sip-outbound] will do away with this continual arms race. 4.6. Call-hold signaling The legacy mechanism defined in [RFC2543] for call-hold by setting the SDP connection address to 0.0.0.0 is unfortunately far from obsolete in usage, despite the superior direction attribute concept of [RFC3264]. To increase interoperability, some devices send both types in the re-Invite, which defeats the purpose of using a direction attribute (e.g., keeping RTCP flowing). Other vendors send the direction attribute first, and if the SDP answer does not mirror it they use the legacy approach, which leads to extraneous signaling overhead. An IETF recommendation/BCP for this is probably warranted. In hindsight [RFC3264] should have been backwards compatible (e.g., still using the 0.0.0.0 syntax with some new attribute for on-hold connection address, which would be ignored by legacy devices but used by newer ones). [note: I recognize this is SDP not SIP, but it's a big deal and was caused by rfc2543] 4.7. Early and on-hold media Several issues with early-media were discussed in [stucker-early- media] and [stucker-middleboxes] which are yet to be resolved. It is not clear if there is consensus that there even is a problem, but there is one. The fact is that there are things that go bump in the night (or bump in the wire, as it were). There are constrained network resources, issues with fraud, and general security concerns; and architectures which "solve" these issues with gates. What's more, NATs themselves cause similar issues. Furthermore, forking and on-hold scenarios have led to issues with the media that is played. For example one not-uncommon on-hold scenario leads to a media server sending music RTP to the on-hold party, which works fine in a closed environment but breaks down when the call put on hold also traversed the PSTN or another domain, whereby multiple parties end up sending music. Some UA's choose one stream to render, others play both simultaneously with poor results. The issue, I believe, is that these media servers send media without actually being part of the SDP offer/answer exchange as UA's, and instead assume they can simply send media as an unidentified third party (which is technically valid, but not realistically sound). The "correct" thing for them to do I believe is to be true B2BUA's. Kaplan Expires - May 2007 [Page 7] SIP Interoperability Issues November 2007 5. References [RFC2543] Rosenberg, J., Schulzrinne, H., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 2543, March 1999. [RFC2833] Schulzrinne, H., Taylor, T., "RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals", RFC 4733, December 2006. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3264] Rosenberg, J., Schulzrinne, H., "An Offer/Answer Model with the Session Description Protocol (SDP)", RFC 3264, June 2002. [RFC3608] Willis, D., Hoeneisen, B., "Session Initiation Protocol (SIP) Extension Header Field for Service Route Discovery During Registration", RFC 3608, October 2003. [RFC3966] Schulzrinne, H., "The tel URI for Telephone Numbers", RFC 3966, December 2004. [RFC4244] Barnes, M., "An Extension to the Session Initiation Protocol (SIP) for Request History Information", RFC 4244, November 2005. [RFC4474] Peterson, J., Jennings, C., "Enhancements for Authenticated Identity Management in the Session Initiation Protocol (SIP)", RFC 4474, August 2006. [RFC4730] Burger, E., Dolly, M., "A Session Initiation Protocol (SIP) Event Package for Key Press Stimulus (KPML)", RFC 4730, November 2006. [sip-outbound] Jennings, C., Mahy, R., "Managing Client Initiated Connections in the Session Initiation Protocol (SIP)", draft-ietf-sip-outbound-11.txt, 2007. [draft-diversion] Levy, S., Yang, J.R., "Diversion Indication in SIP", draft-levy-sip-diversion-08.txt, August 2004. [stucker-early-media] Stucker, B., "Coping with Early Media in the Session Initiation Protocol (SIP)", draft-stucker- sipping-early-media-coping-03.txt, October 2006. Kaplan Expires - May 2007 [Page 8] SIP Interoperability Issues November 2007 [stucker-middleboxes] Stucker, B., Tschofenig, H., "Analysis of Middlebox Interactions for Signaling Protocol Communication along the Media Path", draft-sipping- stucker-media-path-middleboxes-00.txt, November 2007. Author's Address Hadriel Kaplan Acme Packet 71 Third Ave. Burlington, MA 01803, USA Email: hkaplan@acmepacket.com Kaplan Expires - May 2007 [Page 9]