After reading this chapter, you should be able to perform the following tasks:
Describe Voice
over IP (VoIP), components of a VoIP network, the protocols used, and
service considerations of integrating VoIP into an existing data
network.
Describe various types of voice gateways and how to use gateways in different IP telephony environments.
Voice
over Internet Protocol (VoIP) allows a voice-enabled router to carry
voice traffic, such as telephone calls and faxes, over an Internet
Protocol (IP) network. This chapter introduces the fundamentals of
VoIP, the various types of voice gateways, and how to use gateways in
different IP telephony environments.
VoIP Fundamentals
Voice over IP is also known as VoIP. You might also hear VoIP referred to as IP Telephony.
Both terms refer to sending voice across an IP network. However, the
primary distinction revolves around the endpoints in use. For example,
in a VoIP network, traditional analog or digital circuits connect into
an IP network, typically through some sort of gateway. However, an IP
telephony environment contains endpoints that natively communicate
using IP. Be aware that much of the literature on the subject,
including this book, might use these terms interchangeably.
VoIP routes voice
conversations over IP-based networks, including the Internet. VoIP has
made it possible for businesses to realize cost savings by utilizing
their existing IP network to carry voice and data, especially where
businesses have underutilized network capacity that can carry VoIP at
no additional cost. This section introduces VoIP, the required
components in VoIP networks, currently available VoIP signaling
protocols, VoIP service issues, and media transmission protocols.
Cisco Unified Communications Architecture
The Cisco Unified
Communications System fully integrates communications by enabling data,
voice, and video to be transmitted over a single network infrastructure
using standards-based IP. Leveraging the framework provided by Cisco IP
hardware and software products, the Cisco Unified Communications System
has the capability to address current and emerging communications needs
in the enterprise environment. The Cisco Unified Communications family
of products is designed to optimize feature functionality, reduce
configuration and maintenance requirements, and provide
interoperability with a variety of other applications. The Cisco
Unified Communications System provides and maintains a high level of
availability, quality of service (QoS), and security for the network.
The Cisco Unified Communications System incorporates and integrates the following communications technologies:
IP telephony:
IP telephony refers to technology that transmits voice communications
over a network using IP standards. Cisco Unified Communications System
includes hardware and software products such as call processing agents,
IP phones (both wired and wireless), voice messaging systems, video
devices, and other special applications.
Customer contact center:
Cisco IP Contact Center products combine strategy with architecture to
enable efficient and effective customer communications across a global
network. This allows organizations to draw from a broader range of
resources to service customers. These resources include access to a
large pool of customer service agents and multiple channels of
communication as well as customer self-help tools.
Video telephony:
The Cisco Unified Video Advantage products enable real-time video
communications and collaboration using the same IP network and call
processing agent as Cisco Unified Communications. With Cisco Unified
Video Advantage, making a video call is just as easy as dialing a phone
number.
Rich-media conferencing:
Cisco Conference Connection and Cisco Unified MeetingPlace enhance the
virtual meeting environment with an integrated set of IP-based tools
for voice, video, and web conferencing.
Third-party applications:
Cisco works with other companies to provide a selection of third-party
IP communications applications and products. This helps businesses
focus on critical needs such as messaging, customer care, and workforce
optimization.
VoIP Overview
VoIP is the family of
technologies that allows IP networks to be used for voice applications,
such as telephony, voice instant messaging, and teleconferencing. VoIP
defines a way to carry voice calls over an IP network, including the
digitization and packetization of the voice streams. IP Telephony VoIP
standards create a telephony system where higher-level features such as
advanced call routing, voice mail, and contact centers can be utilized.
VoIP services convert your
voice into a digital signal that travels over an IP-based network. If
you are calling a traditional phone number, the signal is converted to
a traditional telephone signal before it reaches its destination. VoIP
allows you to make a call directly from a computer, a VoIP phone, or a
traditional analog phone connected to a special adapter. In addition,
wireless "hot spots" in locations such as airports, parks, and cafes
that allow you to connect to the Internet might enable you to use VoIP
services.
Business Case for VoIP
The business advantages that
drive the implementation of VoIP networks have changed over time.
Starting with simple media convergence, these advantages evolved to
include call-switching intelligence and the total user experience.
Originally, ROI
calculations centered on toll-bypass and converged-network savings.
Although these savings are still relevant today, advances in voice
technologies allow organizations and service providers to differentiate
their product offerings by providing the following:
Cost savings:
Traditional time-division multiplexing (TDM), which is used in the
public switched telephone network (PSTN) environment, dedicates 64 kbps
of bandwidth per voice channel. This approach results in bandwidth
being unused when no voice traffic exists. VoIP shares bandwidth across
multiple logical connections, which results in a more efficient use of
the bandwidth, thereby reducing bandwidth requirements. A substantial
amount of equipment is needed to combine 64-kbps channels into
high-speed links for transport across a network. Packet telephony uses
statistical analysis to multiplex voice traffic alongside data traffic.
This consolidation results in substantial savings on capital equipment
and operations costs.
Flexibility:
The sophisticated functionality of IP networks allows organizations to
be flexible in the types of applications and services they provide to
their customers and users. Service providers can easily segment
customers. This helps them to provide different applications, custom
services, and rates depending on traffic volume needs and other
customer-specific factors.
Advanced features: Following are some examples of the advanced features provided by current VoIP applications:
Advanced call routing:
When multiple paths exist to connect a call to its destination, some of
these paths might be preferred over others based on cost, distance,
quality, partner handoffs, traffic load, or various other
considerations. Least-cost routing and time-of-day routing are two
examples of advanced call routing that can be implemented to determine
the best possible route for each call.
Unified messaging:
Unified messaging improves communications and productivity. It provides
a single user interface for messages that have been delivered over a
variety of mediums. For example, users can read their e-mail, hear
their voice mail, and view fax messages by accessing a single inbox.
Integrated information systems:
Organizations use VoIP to affect business process transformation. These
processes include centralized call control, geographically dispersed
virtual contact centers, and access to resources and self-help tools.
Long-distance toll bypass:
Long-distance toll bypass is an attractive solution for organizations
that place a significant number of calls between sites that are charged
traditional long-distance fees. In this case, it might be more
cost-effective to use VoIP to place those calls across an IP network.
If the IP WAN becomes congested, calls can overflow into the PSTN,
ensuring that no degradation occurs in voice quality.
Security:
Mechanisms in an IP network allow an administrator to ensure that IP
conversations are secure. Encryption of sensitive signaling header
fields and message bodies protect packets in case of unauthorized
packet interception.
Customer relationships:
The capability to provide customer support through multiple mediums,
such as telephone, chat, and e-mail, builds solid customer satisfaction
and loyalty. A pervasive IP network allows organizations to provide
contact center agents with consolidated and up-to-date customer records
along with related customer communication. Access to this information
allows quick problem solving, which builds strong customer
relationships.
Telephony application services:
XML services on Cisco IP Phones give users another way to perform or
access business applications. Some examples of XML-based services on
Cisco IP Phones are user stock quotes, inventory checks, direct-dial
directory, announcements, and advertisements. Some Cisco IP Phones are
equipped with a pixel-based display that can display full graphics
instead of just text in the window. The pixel-based display
capabilities allow you to use sophisticated graphical presentations for
applications on Cisco IP Phones and make them available at any desktop,
counter, or location.
Components of a VoIP Network
Figure 1-1 depicts the basic components of a packet voice network.

The following is a description of these basic components:
IP Phones: Cisco IP Phones provide IP endpoints for voice communication.
Gatekeeper: A gatekeeper provides Call Admission Control (CAC), bandwidth control and management, and address translation.
Gateway:
The gateway provides translation between VoIP and non-VoIP networks,
such as the PSTN. Gateways also provide physical access for local
analog and digital voice devices, such as telephones, fax machines, key
sets, and private branch exchanges (PBX).
Multipoint Control Unit (MCU): An MCU provides real-time connectivity for participants in multiple locations to attend the same videoconference or meeting.
Call agent:
A call agent provides call control for IP phones, CAC, bandwidth
control and management, and address translation. Unlike a gatekeeper,
which in a Cisco environment typically runs on a router, a call agent
typically runs on a server platform. Cisco Unified Communications
Manager is an example of a call agent.
Application servers: Application servers provide services such as voice mail, unified messaging, and Cisco Communications Manager Attendant Console.
Videoconference station:
A videoconference station provides access for end-user participation in
videoconferencing. The videoconference station contains a video capture
device for video input and a microphone for audio input. A user can
view video streams and hear audio that originates at a remote user
station.
Other components,
such as software voice applications, interactive voice response (IVR)
systems, and soft phones, provide additional services to meet the needs
of an enterprise site.
VoIP Functions
In the traditional PSTN
telephony network, all the elements required to complete a call are
transparent to an end user. Migration to VoIP requires an awareness of
these required elements and a thorough understanding of the protocols
and components that provide the same functionality in an IP network.
Required VoIP functionality includes these functions:
Signaling:
Signaling is the capability to generate and exchange control
information that will be used to establish, monitor, and release
connections between two endpoints. Voice signaling requires the
capability to provide supervisory, address, and alerting functionality
between nodes. The PSTN network uses Signaling System 7 (SS7) to
transport control messages. SS7 uses out-of-band signaling, which, in
this case, is the exchange of call control information in a separate
dedicated channel.
VoIP
presents several options for signaling, including H.323, Session
Initiation Protocol (SIP), H.248, Media Gateway Control Protocol
(MGCP), and Skinny Client Control Protocol (SCCP). Some VoIP gateways
are also capable of initiating SS7 signaling directly to the PSTN
network. Signaling protocols are classified as either peer-to-peer or
client/server protocols.
SIP and H.323 are examples of
peer-to-peer signaling protocols where the end devices or gateways
contain the intelligence to initiate and terminate calls and interpret
call control messages. H.248, SCCP, and MGCP are examples of
client/server protocols where the endpoints or gateways do not contain
call control intelligence but send or receive event notifications to a
server commonly referred to as a call agent.
For example, when an MGCP gateway detects a telephone that has gone off
hook, it does not know to automatically provide a dial tone. The
gateway sends an event notification to the call agent, telling the
agent that an off-hook condition has been detected. The call agent
notifies the gateway to provide a dial tone.
Database services:
Access to services, such as toll-free numbers or caller ID, requires
the capability to query a database to determine whether the call can be
placed or information can be made available. Database services include
access to billing information, caller name delivery (CNAM), toll-free
database services, and calling-card services. VoIP service providers
can differentiate their services by providing access to many unique
database services. For example, to simplify fax access to mobile users,
a provider can build a service that converts fax to e-mail. Another
example is providing a call notification service that places outbound
calls with prerecorded messages at specific times to notify users of
such events as school closures, wake-up calls, or appointments.
Bearer control:
Bearer channels are the channels that carry voice calls. Proper
supervision of these channels requires that appropriate call connect
and call disconnect signaling be passed between end devices. Correct
signaling ensures that the channel is allocated to the current voice
call and that a channel is properly deallocated when either side
terminates the call. Connect and disconnect messages are carried by SS7
in the PSTN network. Connect and disconnect message are carried by SIP,
H.323, H.248, or MGCP within the IP network.
Codecs:
Codecs provide the coding and decoding translation between analog and
digital facilities. Each codec type defines the method of voice coding
and the compression mechanism that is used to convert the voice stream.
The PSTN uses TDM to carry each voice call. Each voice channel reserves
64 kbps of bandwidth and uses the G.711 codec to convert an analog
voice wave to a 64-kbps digitized voice stream. In VoIP design, codecs
might compress voice beyond the 64-kbps voice stream to allow more
efficient use of network resources. The most widely used codec in the
WAN environment is G.729, which compresses the voice stream to 8 kbps.
VoIP Signaling Protocols
VoIP uses several control and call-signaling protocols. Among these are:
H.323:
H.323 is a standard that specifies the components, protocols, and
procedures that provide multimedia communication services, real-time
audio, video, and data communications over packet networks, including
IP networks. H.323 is part of a family of International
Telecommunication Union Telecommunication Standardization sector
(ITU-T) recommendations called H.32x that provides multimedia
communication services over a variety of networks. H.32x is an umbrella
of standards that define all aspects of synchronized voice, video, and
data transmission. It also defines end-to-end call signaling.
MGCP:
MGCP is a method for PSTN gateway control or thin device control.
Specified in RFC 2705, MGCP defines a protocol that controls VoIP
gateways that are connected to external call control devices, referred
to as call agents. MGCP provides the signaling capability for
less-expensive edge devices, such as gateways, that might not have
implemented a full voice-signaling protocol such as H.323. For example,
anytime an event, such as off-hook, occurs on a voice port of a
gateway, the voice port reports that event to the call agent. The call
agent then signals the voice port to provide a service, such as
dial-tone signaling.
SIP:
SIP is a detailed protocol that specifies the commands and responses to
set up and tear down calls. SIP also details features such as security,
proxy, and transport control protocol (TCP) or User Datagram Protocol
(UDP) services. SIP and its partner protocols, Session Announcement
Protocol (SAP) and Session Description Protocol (SDP), provide
announcements and information about multicast sessions to users on a
network. SIP defines end-to-end call signaling between devices. SIP is
a text-based protocol that borrows many elements of HTTP, using the
same transaction request and response model and similar header and
response codes. It also adopts a modified form of the URL addressing
scheme used within e-mail that is based on Simple Mail Transfer
Protocol (SMTP).
SCCP:
SCCP is a Cisco proprietary protocol used between Cisco Communications
Manager and Cisco IP Phones. The end stations (telephones) that use
SCCP are called Skinny clients, which consume less processing overhead.
The client communicates with the Cisco Unified Communications Manager
(often referred to as Call Manager, abbreviated UCM) using
connection-oriented (TCP-based) communication to establish a call with
another H.323-compliant end station.
The H.323 Umbrella
H.323 is a suite of protocols
defined by the International Telecommunication Union (ITU) for
multimedia conferences over LANs. The H.323 protocol was designed by
the ITU-T and was initially approved in February 1996. It was developed
as a protocol that provides IP networks with traditional telephony
functionality. Today, H.323 is the most widely deployed standards-based
voice and videoconferencing standard for packet-switched networks.
The protocols specified by H.323 include the following:
H.225 Call Signaling:
H.225 call signaling is used to establish a connection between two
H.323 endpoints. This is achieved by exchanging H.225 protocol messages
on the call-signaling channel. The call-signaling channel is opened
between two H.323 endpoints or between an endpoint and an H.323
gatekeeper.
H.225 Registration, Admission, and Status:
Registration, admission, and status (RAS) is the protocol between
endpoints (terminals and gateways) and gatekeepers. RAS is used to
perform registration, admission control, bandwidth changes, status, and
disengage procedures between endpoints and gatekeepers. A RAS channel
is used to exchange RAS messages. This signaling channel is opened
between an endpoint and a gatekeeper prior to the establishment of any
other channels.
H.245 Control Signaling:
H.245 control signaling is used to exchange end-to-end control messages
governing the operation of an H.323 endpoint. These control messages
carry information related to the following:
Audio codecs:
An audio codec encodes the audio signal from a microphone for
transmission by the transmitting H.323 terminal and decodes the
received audio code that is sent to the speaker on the receiving H.323
terminal. Because audio is the minimum service provided by the H.323
standard, all H.323 terminals must have at least one audio codec
supported, as specified in the ITU–T G.711 recommendation (coding audio
at 64 kbps). Additional audio codec recommendations such as G.722 (64,
56, and 48 kbps), G.723.1 (5.3 and 6.3 kbps), G.728 (16 kbps), and
G.729 (8 kbps) might also be supported.
Video codecs:
A video codec encodes video from a camera for transmission by the
transmitting H.323 terminal and decodes the received video code on a
video display of the receiving H.323 terminal. Because H.323 specifies
support of video as optional, the support of video codecs is optional
as well. However, any H.323 terminal providing video communications
must support video encoding and decoding as specified in the ITU–T
H.261 recommendation.
In Cisco IP Communications
environments, H.323 is widely used with gateways, gatekeepers, and
third-party H.323 clients, such as video terminals. Connections are
configured between devices using static destination IP addresses.
Note
Because H.323 is a
peer-to-peer protocol, H.323 gateways are not registered with Cisco
Unified Communications Manager as an endpoint is. An IP address is
configured in the Cisco UCM to confirm that communication is possible.
MGCP
MGCP
is a client/server call control protocol built on a centralized control
architecture. MGCP offers the advantage of centralized gateway
administration and provides for largely scalable IP telephony
solutions. All dial plan information resides on a separate call agent.
The call agent, which controls the ports on the gateway, performs call
control. An MGCP gateway does media translation between the PSTN and
VoIP networks for external calls. In a Cisco-based network,
Communications Managers function as call agents.
MGCP is a plain-text protocol
used by call-control devices to manage IP telephony gateways. MGCP was
defined under RFC 2705, which was updated by RFC 3660, and superseded
by RFC 3435, which was updated by RFC 3661.
With MGCP, Cisco UCM knows of
and controls individual voice ports on an MGCP gateway. This approach
allows complete control of a dial plan from Cisco UCM and gives
Communications Manager per-port control of connections to the PSTN,
legacy PBX, voice-mail systems, and POTS phones. MGCP is implemented
with use of a series of plain-text commands sent via User Datagram
Protocol (UDP) port 2427 between the Cisco UCM and a gateway.
It is important to note that
for an MGCP interaction to take place with Cisco UCM, an MGCP gateway
must have Cisco UCM support. If you are a registered customer of the
Software Advisor, you can use this tool to make sure your platform and
your Cisco IOS software or Cisco Catalyst operating system version are
compatible with Cisco UCM for MGCP. Also, make sure your version of
Cisco UCM supports the gateway.
PRI/BRI Backhaul
A Primary Rate Interface
(PRI) and Basic Rate Interface (BRI) backhaul is an internal interface
between the call agent (such as Cisco UCM) and Cisco gateways. It is a
separate channel for backhauling signaling information. A PRI backhaul
forwards PRI Layer 3 (Q.931) signaling information via a TCP connection.
An MGCP gateway is relatively
easy to configure. Because the call agent has all the call-routing
intelligence, you do not need to configure the gateway with all the
dial peers it would otherwise need. A downside is that a call agent
must always be available. Cisco MGCP gateways can use Survivable Remote
Site Telephony (SRST) and MGCP fallback to allow the H.323 protocol to
take over and provide local call routing in the absence of a
Communications Manager (for example, during a WAN outage). In that
case, you must configure dial peers on the gateway for use by H.323.
Session Initiation Protocol
SIP
is a protocol developed by the Internet Engineering Task Force (IETF)
Multiparty Multimedia Session Control (MMUSIC) Working Group as an
alternative to H.323. SIP features are compliant with IETF RFC 2543,
published in March 1999; RFC 3261, published in June 2002; and RFC
3665, published in December 2003. Because SIP is a common standard
based on the logic of the World Wide Web and is very simple to
implement, it is widely used with gateways and proxy servers within
service provider networks for internal and end-customer signaling.
SIP is a peer-to-peer
protocol where user agents (UAs) initiate sessions, similar to H.323.
However, unlike H.323, SIP uses ASCII-text-based messages to
communicate. Therefore, you can implement and troubleshoot SIP very
easily.
Because SIP is a peer-to-peer
protocol, the Cisco UCM does not control SIP devices, and SIP devices
do not register with Cisco UCM. As with H.323 gateways, only the IP
address is available on Cisco UCM to confirm that communication between
a Cisco UCM and a SIP voice gateway is possible.
Skinny Client Control Protocol
SCCP is a Cisco
proprietary protocol that is used for the communication between Cisco
UCM and terminal endpoints. SCCP is a client-server protocol, meaning
any event (such as on-hook, off-hook, or buttons pressed) causes a
message to be sent to a Cisco UCM. Cisco UCM then sends specific
instructions back to the device to tell it what to do about the event.
Therefore, each press on a phone button causes data traffic between
Cisco UCM and the terminal endpoint. SCCP is widely used with Cisco IP
Phones. The major advantage of SCCP within Cisco UCM networks is its
proprietary nature, which allows you to make quick changes to the
protocol and add features and functionality.
SCCP is a simplified protocol
used in VoIP networks. Cisco IP Phones that use SCCP can coexist in an
H.323 environment. When used with Cisco Communications Manager, a SCCP
client can interoperate with H.323-compliant terminals.
Comparing VoIP Signaling Protocols
The primary goal for all four
of the previously mentioned VoIP signaling protocols is the same—to
create a bidirectional Real-time Transport Protocol (RTP) stream
between VoIP endpoints involved in a conversation. However, VoIP
signaling protocols use different architectures and procedures to
achieve this goal.
H.323
H.323
is considered a peer-to-peer protocol, although H.323 is not a single
protocol. Rather, it is a suite of protocols. The necessary gateway
configuration is relatively complex, because you need to define the
dial plan and route patterns directly on the gateway. Examples of
H.323-capable devices are the Cisco VG224 Analog Phone Gateway and the
Cisco 2600XM Series, Cisco 2800 Series, 3700 Series, and 3800 Series
routers.
The H.323 protocol
is responsible for all the signaling between a Cisco UCM cluster and an
H.323 gateway. The ISDN protocols, Q.921 and Q.931, are used only on
the Integrated Services Digital Network (ISDN) link to the PSTN, as
illustrated in Figure 1-2.

MGCP
The MGCP protocol is
based on a client/server architecture. That simplifies the
configuration because the dial plan and route patterns are defined
directly on a Cisco UCM server within a cluster. Examples of
MGCP-capable devices are the Cisco VG224 Analog Phone Gateway and the
Cisco 2600XM Series, 2800 Series, 3700 Series, and 3800 Series routers.
Non-IOS MGCP gateways include the Cisco Catalyst 6608-E1 and Catalyst
6608-T1 module.
MGCP is used to manage a
gateway. All ISDN Layer 3 information is backhauled to a Cisco UCM
server. Only the ISDN Layer 2 information (Q.921) is terminated on the
gateway, as depicted in Figure 1-3.

SIP
Like
the H.323 protocol, the SIP is a peer-to-peer protocol. The
configuration necessary for the gateway is relatively complex because
the dial plan and route patterns need to be defined directly on the
gateway. Examples of SIP-capable devices are the Cisco 2800 Series and
3800 Series routers.
The SIP protocol is
responsible for all the signaling between a Cisco UCM cluster and a
gateway. The ISDN protocols, Q.921 and Q.931, are used only on an ISDN
link to the PSTN, as illustrated in Figure 1-4.

SCCP
SCCP works in a client/server architecture, as shown in Figure 1-5,
which simplifies the configuration of SCCP devices such as Cisco IP
Phones and Cisco ATA 180 Series and VG200 Series FXS gateways.

SCCP is used on Cisco VG224
and VG248 analog phone gateways. ATAs enable communications between
Cisco UCM and a gateway. The gateway then uses standard analog
signaling to an analog device connected to the ATA's FXS port. Recent
versions of Cisco IOS voice gateways—for example, the 2800 series—also
support SCCP controlled Foreign Exchange Station (FXS) ports.
VoIP Service Considerations
In
traditional telephony networks, dedicated bandwidth for each voice
stream provides voice with a guaranteed delay across the network.
Because bandwidth is guaranteed in a TDM environment, no variable delay
exists (that is, jitter).
Configuring voice in a data network requires network services with low
delay, minimal jitter, and minimal packet loss. Bandwidth requirements
must be properly calculated based on the codec used and the number of
concurrent connections. QoS must be configured to minimize jitter and
loss of voice packets. The PSTN provides 99.999 percent availability
(that is, the five nines of availability).
To match the availability of the PSTN, an IP network must be designed
with redundancy and failover mechanisms. Security policies must be
established to address both network stability and voice-stream security.
Table 1-1 lists issues associated with implementing VoIP in a converged network and solutions that address these issues.
Table 1-1. Issues and Solutions for VoIP in a Converged Network
| Issue | Solutions |
|---|
| Latency | Increase bandwidth.
Choose a different codec type.
Fragment data packets.
Prioritize voice packets. |
| Jitter | Use dejitter buffers.
Prioritize voice packets. |
| Bandwidth | Calculate bandwidth requirements, including voice payload, overhead, and data. |
| Packet loss | Design the network to minimize congestion.
Prioritize voice packets.
Use codecs to minimize small amounts of packet loss. |
| Reliability | Provide redundancy for hardware, links, and power (uninterruptible power supply [UPS]).
Perform proactive network management. |
| Security | Secure the following components:
Network infrastructure Call-processing systems Endpoints Applications
|
Media Transmission Protocols
In
a VoIP network, the actual voice data (conversations) are transported
across the transmission media using RTP and RTP Control Protocol
(RTCP). RTP defines a standardized packet format for delivering audio
and video over the Internet. RTCP is a companion protocol to RTP as it
provides for the delivery of control information for individual RTP
streams. Compressed Real-time Transport Protocol (cRTP) and Secure
Real-time Transport Protocol (sRTP) were developed to enhance the usage
of RTP.
Datagram protocols, such as UDP,
send a media stream as a series of small packets. This approach is
simple and efficient. However, packets are liable to be lost or
corrupted in transit. Depending on the protocol and the extent of the
loss, a client might be able to recover lost data with error correction
techniques, might interpolate over the missing data, or might suffer a
data dropout. RTP and the RTCP were specifically designed to stream
media over networks. They are both built on top of UDP.
Real-Time Transport Protocol
RTP defines a standardized
packet format for delivering audio and video over the Internet. It was
developed by the Audio-Video Transport Working Group of the IETF and
was first published in 1996 as RFC 1889, which was made obsolete in
2003 by RFC 3550.
RTP provides end-to-end
network transport functions intended for applications with real-time
transmission requirements, such as audio and video. Those functions
include payload-type identification, sequence numbering, time stamping,
and delivery monitoring. Figure 1-6
shows a typical role played by RTP in a VoIP network. Specifically,
notice RTP communicates directly between the voice endpoints, whereas
the call setup protocols (that is, H.225 and H.245 in this example) are
used to communicate with voice gateways.

RTP
typically runs on top of UDP to use the multiplexing and checksum
services of that protocol. RTP does not have a standard TCP or UDP port
on which it communicates. The only standard it obeys is that UDP
communications are done via an even port, and the next higher odd port
is used for RTCP communications. Although no standards are assigned, in
a Cisco environment RTP is generally configured to use UDP ports in the
range 16,384–32,767.
RTP can carry any data with
real-time characteristics, such as interactive audio or video. The fact
that RTP uses a dynamic port range can make it difficult for it to
traverse firewalls.
Although RTP is often used
for unicast sessions, it is primarily designed for multicast sessions.
In addition to the roles of sender and receiver, RTP defines the roles
of translator and mixer to support multicast requirements.
RTP is frequently used in
conjunction with Real-time Streaming Protocol (RTSP) in streaming media
systems. RTP is also used in conjunction with H.323 or SIP in
videoconferencing and push-to-talk systems. These two characteristics
make RTP the technical foundation of the VoIP industry. Applications
using RTP are less sensitive to packet loss, but typically very
sensitive to delays, so UDP is a better choice than TCP for such
applications.
RTP is a critical
component of VoIP because it enables the destination device to reorder
and retime the voice packets before they are played out to the user. An
RTP header contains a time stamp and sequence number, which allow the
receiving device to buffer and to remove jitter by synchronizing the
packets to play back a continuous stream of sound. RTP uses sequence
numbers only to order the packets. RTP does not request retransmission
if a packet is lost.
RTP Control Protocol
RTCP is a sister protocol of
RTP. It was first defined in RFC 1889 and was made obsolete by RFC
3550. RTP provides out-of-band control information for an RTP flow. It
works alongside RTP in the delivery and packaging of multimedia data,
but does not transport any data itself. Although RTCP is periodically
used to transmit control packets to participants in a streaming
multimedia session, the primary function of RTCP is to provide feedback
on the quality of service being provided by RTP.
RTCP is used for QoS
reporting. It gathers statistics on a media connection and information
such as bytes sent, packets sent, lost packets, jitter, feedback, and
round-trip delay. Applications use this information to increase the
quality of service, perhaps using a low-compression codec instead of a
high-compression codec.
There are several
types of RTCP packets: Sender Report Packet, Receiver Report Packet,
Source Description RTCP Packet, Goodbye RTCP Packet, and
application-specific RTCP packets.
RTCP provides the following feedback on current network conditions:
RTCP provides a
mechanism for hosts involved in an RTP session to exchange information
about monitoring and controlling the session. RTCP monitors the quality
of elements such as packet count, packet loss, delay, and interarrival
jitter. RTCP transmits packets as a percentage of session bandwidth,
but at a specific rate of at least every five seconds.
The
RTP standard states that the Network Time Protocol (NTP) time stamp is
based on synchronized clocks. The corresponding RTP time stamp is
randomly generated and based on data packet sampling. Both NTP and RTP
are included in RTCP packets by the sender of the data.
RTCP
provides a separate flow from RTP. When a voice stream is assigned UDP
port numbers, RTP is typically assigned an even-numbered port and RTCP
is assigned the next odd-numbered port. Each voice call has four ports
assigned: RTP plus RTCP in the transmit direction and RTP plus RTCP in
the receive direction.
Compressed RTP
RTP includes a data portion and
a header portion. The data portion of RTP is a thin protocol that
provides support for the real-time properties of applications, such as
continuous media, including timing reconstruction, loss detection, and
content identification. The header portion of RTP is considerably
larger than the data portion. The header portion consists of the IP
segment, the UDP segment, and the RTP segment. Given the size of the
IP/UDP/RTP segment combinations, it is inefficient to send the
IP/UDP/RTP header without compressing it. Figure 1-7
illustrates using RTP header cRTP over a relatively low-speed WAN link
(such as a T1 link), which could benefit from the bandwidth freed up by
compressing the IP/UDP/RTP header.

The IP header portion
consists of an IP segment, a UDP segment, and an RTP segment. The
minimal 20 bytes of the IP segment, combined with the 8 bytes of the
UDP segment and the 12 bytes of the RTP segment, create a 40-byte
IP/UDP/RTP header. The RTP packet has a payload of approximately 20 to
150 bytes for audio applications that use compressed payloads.
The RTP header compression
feature compresses the IP/UDP/RTP header in an RTP data packet from 40
bytes to approximately 2 to 4 bytes.
cRTP, specified in RFCs 2508, 2509, and 3545, was developed to decrease the size of the IP, UDP, and RTP headers.
RFC 2508: Compressing IP/UDP/RTP Headers for Low-Speed Serial Links
RFC 2509: IP Header Compression over PPP
RFC 3545: Enhanced Compressed RTP (ECRTP) for Links with High Delay, Packet Loss and Reordering
RFC 2509 was designed to
work with reliable and fast point-to-point links. In less than optimal
circumstances, where there might be long delays, packet loss, and
out-of-sequence packets, cRTP doesn't function well for VoIP
applications. Another adaptation, ECRPT, was defined in a subsequent
Internet draft document to overcome that problem.
RTP header compression is
supported on serial lines using Frame Relay, HDLC, or PPP
encapsulation. It is also supported over ISDN interfaces.
Why and When to Use cRTP
cRTP does not
technically perform compression. Rather, cRTP leverages the fact that
much of the header information in every packet in a VoIP stream
contains redundant information, and cRTP then suppresses the sending of
that redundant information. For example, after a VoIP call flow is
established, every packet has the same source and destination IP
addresses, the same source and destination UDP port numbers, and the
same RTP payload type. By caching this redundant information in the
gateways at each end of a link, sending reduced headers, and then
reassembling the full header, cRTP can achieve significant bandwidth
savings without any loss of information.
RTP header compression
also reduces overhead for multimedia RTP traffic. The reduction in
overhead for multimedia RTP traffic results in a corresponding
reduction in delay. RTP header compression is especially beneficial
when the RTP payload size is small; for example, for compressed audio
payloads of 20 to 50 bytes.
Use RTP header
compression on any WAN interface where you are concerned about
bandwidth and where there is a high portion of RTP traffic. RTP header
compression can be used for media-on-demand and interactive services
such as Internet telephony. RTP header compression provides support for
real-time conferencing of groups of any size within the Internet. This
support includes source identification support for gateways such as
audio and video bridges and support for multicast-to-unicast
translators. RTP header compression can benefit both telephony voice
and multicast backbone (MBONE) applications running over slow links.
Note
Using
RTP header compression on any high-speed interfaces (that is, anything
over T1 speed) is not recommended. Any bandwidth savings achieved with
RTP header compression might be offset by an increase in CPU
utilization on the router.
Secure RTP
sRTP was first published by
IETF in March 2004 as RFC 3711; it was designed to provide encryption,
message authentication, and integrity, and replay protection to RTP
data in both unicast and multicast applications.
sRTP also has a sister
protocol, called Secure RTCP (sRTCP). sRTCP provides the same
security-related features to RTCP as the ones provided by sRTP to RTP.
sRTP can be used in conjunction with compressed RTP. Figure 1-8 demonstrates that an sRTP flow travels between devices (Cisco IP phones in Figure 1-8), which are capable of sending and receiving sRTP traffic.

Flow Encryption
sRTP standardizes
utilization of only a single cipher, Advanced Encryption Standard
(AES), which can be used in two cipher modes, which turn the original
block AES cipher into a stream cipher:
Segmented Integer Counter Mode:
A counter mode that allows random access to any blocks and that is
essential for RTP traffic running over unreliable networks with
possible loss of packets. AES running in this mode is the default
encryption algorithm, with a default encryption key length of 128 bits
and a default session salt key length of 112 bits.
f8-mode:
A variation of output feedback mode. The default values of the
encryption key and salt key are the same as for AES in Counter Mode.
In addition to the AES
cipher, sRTP gives the user the ability to disable encryption outright,
using the so called NULL cipher. However, the NULL cipher does not
perform any encryption. Rather, the encryption algorithm functions as
though the key stream contains only zeroes, and it copies the input
stream to the output stream without any changes.
Note
It
is mandatory for the NULL cipher mode to be implemented in any
sRTP-compatible system. As such, it can be used when the
confidentiality guarantees ensured by sRTP are not required, and other
sRTP features (such authentication and message integrity) might be used.
Because encryption
algorithms do not secure message integrity themselves, allowing the
attacker to either forge the data or at least to replay previously
transmitted data, sRTP also provides the means to secure the integrity
of data and safety from replay.
Authentication and Integrity
The HMAC-SHA1 algorithm
(defined in RFC 2104) is used to authenticate a message and protect its
integrity. This algorithm produces a 160-bit result, which is then
truncated to 80 bits to become the authentication tag, which is then
appended to a packet. The HMAC is calculated over the packet payload
and material from the packet header, including the packet sequence
number.
Replay Protection
To protect against replay
attacks, a receiver must maintain the indices of previously received
messages, comparing them with the index of each newly received message
and admitting the new message only if it has not been played before.
Such an approach heavily relies on integrity protection being enabled
(to make it nearly impossible to spoof message indices).