VoIP Lexicon
This serves as a reference guide with brief summaries on VoIP/SIP. Instructions for configuring devices can be found here.
VoIP / MoIP
VoIP stands for "Voice over IP" and describes a method of transmitting voice over a computer network. It is also possible to transmit video streams and messages, in which case it is referred to as MoIP, "Multimedia over IP."
VoIP encompasses various transmission techniques, but here we focus specifically on transmission via the SIP protocol.
Network Basics
Protocols
IPv4 / IPv6
Internet Protocol version 4 (IPv4) is the foundation of today's IP networks and the Internet. It belongs to OSI Layer 3 (Network Layer). An IPv4 address consists of 4 bytes but is typically represented as four decimal numbers separated by dots, with values ranging from 0 to 255. Every IP address used on the public Internet must be globally unique. The 4-byte address space theoretically provides over 4 billion addresses, though fewer are effectively usable.
Internet Protocol version 6 (IPv6) is increasingly being used instead of IPv4. It is an evolution of IPv4, offering several advantages and massively expanding the available IP address space.
Since there are currently almost no VoIP devices that support IPv6, we primarily focus on IPv4 in this document.
Addresses on the Internet
Often, only an IP address is specified as a destination on the Internet.
However, please note that a complete address always includes a port (e.g., 192.168.0.11:5060
).
Addresses in the LAN
The IP address ranges to be used in the LAN have been defined as follows:
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
These must be used mandatorily; otherwise, address conflicts with addresses on the Internet may occur.
TCP
Transmission Control Protocol
TCP is a transport protocol that ensures a secure end-to-end transport connection. If the receipt of a packet is not acknowledged within a certain time (Retransmission Time), the source will resend the respective TCP segment. Due to its lack of real-time capability, the TCP protocol is only of limited significance in the context of VoIP, mainly for session signaling. Using TCP with SIP is optional, with UDP being the preferred choice. For the transmission of real-time data such as voice packets, TCP is largely unsuitable due to potential unacceptable delays.
UDP
User Datagram Protocol
UDP is a simple transport protocol that ensures an end-to-end transport connection but without security. Due to its connectionless and unsecured communication, UDP datagrams are transmitted as quickly as possible, without delays caused by, for example, packet retransmissions. This makes UDP highly suitable for real-time communication.
RTP / RTCP
Real-Time Protocol
RTP is a transport protocol for transmitting real-time audio and video data. RTP packets are typically transported via UDP to avoid overhead.
RTP Control Protocol
RTCP is a protocol for controlling RTP sessions. It complements RTP by exchanging quality-of-service information between the sender and receiver.
SIP
Session Initiation Protocol
The SIP protocol is used for transmitting signaling messages to establish communication relationships ("sessions") in the VoIP domain. The media types within a session (VoIP, "Video over IP," or other multimedia applications), as well as the parameters required for encoding and decoding multimedia data (e.g., used codecs, etc.), are transmitted within SIP messages, along with participant and signaling information. SIP, with the help of the supporting SDP protocol, provides the complete communication foundation for setting up, modifying, and terminating a session. Compared to the H.323 protocol suite (ISDN), it offers the advantages of ease of use, expandability, and clarity.
SIP can be transported via either UDP or TCP. Since SIP, as a signaling and session control protocol, employs handshake, retransmission, and timeout mechanisms for communication security, and thus operates in a connection-oriented manner, there is no need to use a connection-oriented transport protocol. For this reason, connectionless UDP is typically used as the transport protocol for SIP. Unlike connection-oriented protocols such as TCP and SCTP, UDP offers the advantage that no additional connection setup is required before SIP signaling data exchange, and no flow control occurs during transmission. This results in time and bandwidth savings when using UDP as the transport protocol for SIP.
Since SIP is largely based on the HTTP standard in terms of its message structure and SIP messages are transmitted using the ASCII-compatible UTF-8 character set (Universal Character Set Transformation Format), no special decoders are required for their textual representation.
SDP
Session Description Protocol
SDP is used to describe media that is transmitted within a "Multimedia over IP" session. In addition to media types (audio, video, etc.), it also transmits contact parameters (IP address and port), as well as available codecs (G.722, G.711, H.264, etc.). It is transmitted in the body of the SIP packet.
STUN
Session Traversal Utilities for NAT
With the help of the STUN protocol, devices located in private networks can determine their respective public network contact details and independently enter them into SIP header fields and SDP parameters.
However, activating STUN is only necessary if specified by the provider; normally, it is not required.
DNS
Domain Name System
DNS is a hierarchical distributed naming system used to translate domain names (e.g., example.com) into IP addresses (e.g., 192.0.2.1) and vice versa. It plays a crucial role in enabling users to access resources on the internet using human-readable domain names instead of numerical IP addresses.
During the DNS resolution process, when a user enters a domain name into a web browser or another network application, the application sends a DNS query to a DNS resolver, which is typically provided by the user's Internet Service Provider (ISP) or a public DNS resolver service. The resolver then initiates the resolution process by querying authoritative DNS servers to obtain the corresponding IP address for the specified domain name.
The DNS hierarchy consists of various types of DNS servers, including:
- Root DNS Server: These servers are at the top of the DNS hierarchy and refer to Top-Level Domain (TLD) DNS servers.
- Top-Level Domain (TLD) DNS Servers: These servers are responsible for managing domain names within specific top-level domains such as
.com
,.org
,.net
, and country-specific TLDs like.ch
,.at
,.de
, etc. - Authoritative DNS Servers: These servers store DNS records (such as A records, AAAA records, MX records, etc.) for specific domain names and are responsible for providing authoritative responses to DNS queries.
DNS operates over UDP or TCP on port 53
. UDP is typically used for DNS queries, while TCP is used for large DNS responses or zone transfers.
Overall, DNS is a fundamental component of internet infrastructure, providing essential name resolution services that enable seamless communication and access to online resources.
NAT / PAT
Network Address Translation / Port Address Translation
NAT was introduced to address the shortage of IPv4 addresses. With NAT, not every endpoint device needs a publicly reachable IPv4 address on the internet, but only each network. The devices connected within the network then communicate using a shared public IP address.
This is achieved by allowing a router to modify the source address of network packets. As a result, a target on the internet does not see the local address of the sending device as the sender’s address but instead sees the public address of the router. To correctly assign incoming responses from the internet to the appropriate local device, the router stores this information in a NAT table.
To prevent NAT tables from growing indefinitely, the router assigns a validity period to each entry in the NAT table. Once this time is reached, and no more data has been transmitted over the stored connection, the router deletes the entry.
PAT is often overlooked in discussions about NAT. If the port used by the sending device (typically 5060
for SIP) is already in use on the WAN side,
the router must not only change the IP address to the WAN address but also modify the source port to an available port.
Thus, it may happen that a SIP packet sent from the endpoint device with the source address 192.168.0.11:5060
is modified by the router to 80.70.60.51:1024
.
CGNAT
Carrier-Grade NAT
CGNAT, also known as large-scale NAT or NAT444, is a network address translation (NAT) technique used by Internet Service Providers (ISPs) to manage the shortage of public IPv4 addresses and support the growing number of devices connected to the internet.
In a CGNAT deployment, multiple customers within an ISP network receive private IP addresses from a shared pool of addresses
(typically within the RFC 1918 range: 10.0.0.0/8
, 172.16.0.0/12
, 192.168.0.0/16
, or within the RFC 6598 range 100.64.0.0/10
).
These private IP addresses are used for communication within the ISP network but cannot be directly accessed from the public internet.
When a device within the ISP network initiates communication with a target on the internet, such as accessing a website or connecting to a server, the CGNAT device performs address translation on the outgoing packets. It replaces the private source IP address of the packet with a public IP address from the ISP’s pool and maintains a mapping table to track the translation.
Similarly, when receiving incoming packets from the internet intended for a device within the ISP network, the CGNAT device performs reverse address translation. It replaces the public destination IP address with the corresponding private IP address before forwarding the packet to the intended recipient.
Advantages of CGNAT include:
- IPv4 address conservation: CGNAT allows ISPs to conserve public IPv4 addresses by multiplexing multiple customers behind a single public IP address.
- Scalability: CGNAT enables ISPs to support a large number of internet-connected devices within their network while minimizing the exhaustion of the public IPv4 address space.
- Network security: CGNAT provides a level of network security by hiding customers’ internal device IP addresses from the public internet, reducing exposure to potential attacks and unauthorized access.
Disadvantages of CGNAT include:
- Limited port availability: CGNAT can restrict the availability of ports for certain applications and services, potentially causing issues with peer-to-peer communication, online gaming, and other applications that require specific port configurations.
- Impact on peer-to-peer communication: CGNAT can affect peer-to-peer communication and certain network protocols that rely on end-to-end connectivity, as it introduces an additional layer of address translation and may restrict incoming connections.
- Complexity: Managing and troubleshooting CGNAT deployments can be complex, especially in large networks with high traffic volumes and dynamic address assignments.
Overall, CGNAT plays a crucial role in extending the lifespan of IPv4 and enables ISPs to provide internet access to a growing number of subscribers while addressing the limitations caused by IPv4 address exhaustion.
SIP-ALG
Since SIP devices are typically installed in a LAN, they are only aware of the internal IP address of the network and not the WAN address, through which packets are sent to the provider. As a result, SIP devices can only include the internal IP address in SIP packets. This led some router and firewall manufacturers to implement a feature in their products that replaces the internal IP address in SIP packets with the WAN address of the connection. The idea was that this would allow the SIP packets to contain the correct address where the provider should send incoming calls. However, most router and firewall manufacturers only implemented the IP address replacement. As described under "Addresses on the Internet," the port is also part of the address, not just the IP. If the router or firewall simultaneously replaces the port with PAT and the IP with SIP-ALG, the SIP packets will ultimately contain an incorrect address.
For this reason, sipcall always recommends disabling SIP-ALG. The sipcall servers have NAT detection enabled, which recognizes private IP addresses in SIP packets and adjusts accordingly.
Unfortunately, some mobile network providers have SIP-ALG enabled in their CGNAT. If this is the case and the mobile provider does not wish to disable it, you can change the transport protocol to TCP on the SIP device. Often, TCP packets are not manipulated by SIP-ALG, whereas UDP packets are.
NAT keep alive
To keep the configuration effort for users as low as possible, VoIP has been designed in such a way that no port forwarding needs to be configured in the router for incoming calls. VoIP devices include the "NAT keep alive" feature. When "NAT keep alive" is enabled on a device, it periodically sends an empty packet to the configured SIP server. This packet keeps the connection entry in the router's NAT table "alive," ensuring that incoming calls can be routed properly.
RTP keep alive
Due to the NAT situation with customers, the SIP server waits to deliver the RTP streams until the customer's device sends an RTP stream to our server. Only when the RTP stream is sent does a connection through the NAT open, allowing the incoming RTP stream to be routed to the device. If a PBX is configured to forward a call externally instead of to a device, the PBX does not have an RTP stream from the device to send out. As a result, the SIP server does not deliver the RTP streams on either the incoming or outgoing side of the forwarded call, since no connections are opened through the NAT. In this case, RTP keep-alive must be enabled so that the PBX generates and sends RTP packets, thus opening a connection through the NAT. As soon as the SIP server receives the empty RTP packets from RTP keep-alive, the server knows that the connection through NAT is open and delivers the RTP streams for both sides.
QoS
For voice communication, good quality is assumed when the signal delay (mouth to ear in one transmission direction) remains below 200ms. A delay of 200-300ms is still considered good, while 300-400ms is barely acceptable. Above 400ms, intelligibility is no longer sufficient.
In modern networks, transmission capacity is so high that no additional delays should occur beyond the normal packet travel time. This keeps the signal delay approximately at the propagation time, which is usually below 100ms.
If the signal delay is too high, this indicates a network issue, such as insufficient NAT throughput on the firewall. In this case, it is more effective to resolve the problem, for example, by using a higher-performance firewall, rather than trying to prioritize RTP packets of the audio stream using QoS settings. Only when no further network optimization is possible should QoS settings be used to prioritize real-time data.
However, if multiple real-time streams slow each other down, QoS settings will not be able to correct the issue.
OSI Model / DOD Model
There are several reference models used to assign network protocols to different layers in a network. The most well-known model is the OSI model with 7 layers, while Wireshark uses the DOD model for representation.
OSI Model
Layer: | Name: | Protocols: |
---|---|---|
7 | Application Layer | Browser, Email, Instant Messaging |
6 | Presentation Layer | Translator between different data formats |
5 | Session Layer | RPC, HTTP, FTP, SIP, SDP, RTP |
4 | Transport Layer | TCP, UDP |
3 | Network Layer | IP, IPX |
2 | Datalink Layer | MAC |
1 | Physical Layer | Ethernet |
DOD Model
Layer: | Name: | Protocols: |
---|---|---|
4 | Process | HTTP, SMTP, FTP, Telnet, SSH |
3 | Host-to-Host | TCP, UDP |
2 | Internet | IP, IPX |
1 | Network Access | Ethernet |
Firewall
Since the ports used by the endpoints are dynamically negotiated, it is not possible to make the endpoints accessible for incoming calls from the internet using classic port forwarding. For this reason, we recommend configuring the firewall based on IP addresses rather than ports. Allow all inbound and outbound traffic to and from our SIP servers. You can find which IP addresses sipcall uses here.
SIP Requests
- REGISTER:
- Registers a client with the registrar
- INVITE:
- Invites a SIP participant to a session
- re-INVITE:
- Modifies an existing session
- ACK:
- Confirms an INVITE
- CANCEL:
- Cancels an INVITE
- BYE:
- Ends a session
- OPTIONS:
- Used to exchange supported request methods
SIP Response
- 1XX:
- The request has been received and is being processed
- 2XX:
- The request has been successfully processed
- 3XX:
- The call is being redirected to another participant
- 4XX:
- A client-side error has occurred
- 5XX:
- A server-side error has occurred
- 6XX:
- The request cannot be fulfilled by any server
Important Parameters in the SIP Packet
- From:
- Display name, SIP-URI, caller number
- To:
- Display name, SIP-URI, destination number
- Contact:
- Temporary SIP-URI, can be used for direct contact
- Authorization:
- Authentication via username and password
- Call-ID:
- A randomly generated value by the initiator, identifies all requests and status information related to the same session
- CSeq:
- A random number, identifies the SIP request to which the SIP response refers
- User-Agent:
- Name of the user agent
- o=:
- Origin, name of the person initiating the media session, followed by a randomly generated session number, session version, network type, and contact address
- c=:
- Payload reception address of the respective session participant, network type, and IP address for the payload
- m=:
- Media descriptions, media type, port, protocol, codec
- a=:
- Attributes, explanation of the codecs specified in the "m" parameter
SIP URI
SIP Uniform Resource Identifier
A SIP URI represents the contact address of a SIP endpoint. Its function is comparable to the telephone number of former telecommunications networks. The structure of a SIP URI follows the format of an email address with a prefixed protocol designation.
sip:User@Host
"User" represents an individual username, while "Host" corresponds to an IP address or a domain name.
Codecs
Basics
There are various techniques to digitize spoken language. This is done using so-called codecs.
The word "codec" is composed of the following two words: Coder and Decoder. A codec is a software component responsible for encoding analog signals into a digital data stream and decoding them back.
Digitization of Analog Signals
During the analog-to-digital conversion at the sender, the analog audio signal is sampled. The obtained time-discrete but still amplitude-continuous sampled values are then quantized, which means that a specific amplitude range is assigned a discrete amplitude value. The result is a time- and amplitude-discrete signal.
To transmit this as a digital signal, the individual amplitude values are mapped into digital codewords, specifically 0/1 sequences. This process is called encoding.
Depending on the codec used, it is taken into account that the original analog speech signal contains redundant information. This redundant information can be removed during encoding, which compresses the speech signal.
Recommended Codecs
Since major providers only allow a limited number of codecs on carrier connections, we recommend using the following codecs with the specified priority:
Audio:
- G.722 - HD codec with high quality
- PCMA (G.711-alaw) - Standard codec in Europe
- PCMU (G.711-ulaw) - Standard codec in America
Video:
- H.264 - Widely used video codec
Transcoding
Transcoding refers to the process of decoding and re-encoding a data stream. This is necessary when device A uses a different codec than device B. As a result, a direct media data exchange between the two devices is not possible. If the data stream is properly transcoded, media exchange will work.
However, transcoding should be avoided if possible by using appropriate codec settings on the endpoints, as it requires considerable computing power on the transcoding server.
Network Bandwidth in VoIP
In recent years, internet connections have become increasingly faster, making bandwidth conservation generally unnecessary. However, if bandwidth calculation is still required, you can refer to the following values:
Codec: | Bitrate: |
---|---|
G.711-alaw | 64kbit/s |
G.711-ulaw | 64kbit/s |
G.722 | 64kbit/s |
Opus | 6-510kbit/s |
H.264 | 64kbit/s-240Mbit/s |
End Devices / User Agent
There are various end devices such as desk phones, softphones, MS Teams with telephony integration, and telephone systems. For analog devices, there are SIP adapters that allow continued use of analog equipment. Although the physical designs differ significantly, these devices do not differ in terms of SIP functionality.
Tracing
Tracing means in the network environment the recording and saving of network traffic between computers or devices. All packets transmitted over the network are logged and often stored in a file (usually in PCAP format).
Tracing allows you to find out which information was transmitted by whom and when. This is very useful for troubleshooting to examine network packets precisely.
Wireshark
Wireshark is a free software for analyzing and visually processing data protocols. With the integrated "network sniffer," network packets can also be recorded.
Wireshark provides specialized tools for analyzing SIP packets, significantly simplifying the analysis. For unencrypted data, Wireshark allows viewing network packet data in plaintext. This means SIP data can be seen, and even the audio portion can be listened to.
Switch with Port Mirroring (Port Duplication)
Port mirroring can output (mirror) network traffic, that passes through a port on a switch, to a different port. This makes it possible to receive the packets of another port on a computer and record them with Wireshark.
Other Methods for Tracing
Apart from a switch using port mirroring, there are other ways to create a trace:
- Many phones offer a trace function via the web interface. We have also created detailed instructions for this.
- In 3CX, tracing can be started via the PBX web interface.
- For PBX systems that do not offer a tracing function via the web interface, tracing can also be performed at the operating system level:
- On Linux with tcpdump
- On Windows with Wireshark
- If a firewall is installed in the network, tracing can also be performed using the firewall.
Common Issues (Cause/Solution)
User Agent Does Not Register
The two most common causes of a failed registration are:
- Firewall: If a firewall blocks the REGISTER packets, the user agent cannot register.
- Incorrect password: Sometimes, when copying the password, an extra space is accidentally included, making the password incorrect.
No Incoming Calls
The two most common causes for not receiving incoming calls are:
- SIP-ALG: An active SIP-ALG can cause the address for incoming calls to be incorrect, so SIP-ALG should always be disabled.
- Firewall: If a firewall blocks incoming calls, the firewall configuration must be adjusted accordingly.
One-Way Voice
If, during an established call, the caller can be heard by the other party but cannot hear the other party themselves, this is referred to as "One-Way Voice." The cause is often a firewall blocking incoming RTP packets.
Details regarding network and firewall configuration can be found here.
Poor Call Quality
For optimal call quality, it is important that an audio stream packet with 20ms of audio content is transmitted every 20ms. If you experience poor call quality, there can be various causes:
Packet Loss
If packets are lost along the route between the endpoint and the sipcall server, call quality decreases. The cause of packet loss is often a malfunction in a network component or the internet connection. Restarting the affected device often helps. If the internet connection is the cause, the internet provider must ensure that packet loss no longer occurs.
Jitter / Delays
It is rather rare, but sometimes packets are transmitted with delays. In audio transmissions, delayed packets are discarded because they cannot be played back late. The cause of delays is often overloaded network components. For example, if a firewall is designed for a 10Mbit/s internet connection but is suddenly connected to a 1Gbit/s connection due to an upgrade, the increased traffic can overload the firewall's CPU. This can lead to packets being delivered with delays.
Analog Interference
If the endpoint being used has a defect in its analog components (e.g., microphone or speaker), this is often not visible in SIP log files. In such cases, Wireshark helps with analysis because you can listen to recorded calls. If you hear noise, for example, this may indicate a defect.
Ghost Calls
Since endpoints keep a connection open through the router to remain available for incoming calls, all SIP packets arriving at the router on the corresponding port are forwarded to the endpoint. Hackers exploit this by sending SIP packets to endpoints over the internet. Unfortunately, some endpoints ring for every incoming call, even if it is not legitimate. In such cases, we refer to these as ghost calls.
To prevent ghost calls, please install a firewall and allow incoming packets only from the sipcall VoIP servers.
You can find which IP addresses sipcall uses here.
Security
Since SIP devices and telephone systems are accessible from the internet, it is important to keep the devices up to date.
Equally important is the length of the passwords used. These should be at least 15 characters long.
Additionally, a firewall should restrict communication to the designated provider to prevent SIP packets from being sent directly to the devices.