Hacking WhatsApp, part 2 - parsing the Whatsapp VOIP protocol
In this article I want to tell you how I cracked several parts of the WhatsApp VoIP protocol using a jailbroken iOS device and a set of different analysis programs.
Recently, Whatsapp has received a lot of attention due to vulnerabilities and opportunities for hackers.
From this point of view, it is very interesting for research on its safety.
Everyone who is also interested in this, welcome under the cat.
Although the official pages of Whatsapp have a description of its encryption, in fact, there is no detailed information anywhere about how it works and how it is implemented into the protocol.
Hence, there is no basis for detailed security analytics on Whatsapp itself.
My research was based on three things:
1. Network traffic analytics
2. Binaries
analytics 3. Application behavior analytics in different modes
Toolkit
To analyze the Wahtsapp client for iOS, I used the following tools:
- Binary descriptor - bfdecrypt
- Binary file disassembler - Hopper Disassembler and radare2
- Network traffic analysis - Wireshark
- Application action analytics - Frida
How I set up the jailbreak on iOS is beyond the scope of this article.
Analyzing network traffic
In this part, we will analyze the network traffic of the Whatsapp client during a call, which we will record using Wireshark.
To record such traffic, I created a remote virtual network interface.
The command for Makos looks like this:
rvictl -s Here the device UUID needs to be replaced with the UUID of the device with the watsap client.
Wireshark detects the use of Session Traversal Utilities for NAT (STUN).
STUN is a signaling protocol that is required to establish peer-to-peer connections between clients.
Here the WhatsApp client uses TCP packets to communicate with different Watzap servers.
At the same time, UDP packets are used for exchange between clients.
Hundreds of UDP packets pass in a minute.
Vatsap uses Secure Real Time Protocol (SRTP) and it is obvious that these UDP packets contain SRTP data about the call.
SRTP protocol provides encryption, authentication and protection against replay attacks on RTP traffic.
Let's take a closer look at the SRTP packets exchanged between A and B sides.
To do this, convert them to hexadecimal:
It can be seen that the fields contain RTP headers specific to SRTP.
The first four bytes (highlighted in red) are the 7 RTP header fields.
Let's consider them in more detail:
0x8078001e = 0b10_0_0_0000_0_111100_00000000000011110 = V = 10 | P = 0 | X = 0 | CC = 0000 | M = 0 | PT = 111100 | SEQ = 00000000000011110
The first 2 bits contain the version number (V), in our case it is the second version.
The third bit is a field for optional information, in our case it is empty.
The fourth bit - the extension field (X) indicates that in this case there are no other headers after the RTP packet header.
Bits 5 to 8 - Contains the number of CSRC identifiers following the permanent header.
CSRC (contributing source) is the source of the RTP packet stream that contributes to the total stream generated by the RTP mixer. The mixer inserts a list of SSRC identifiers that identify partial sources into the header of RTP packets. This list is called a CSRC list. For example - an audio conference where the mixer marks all speakers whose voice generates outgoing packets. This allows the receiving side to identify the speaker, although all packets have the same SSRC ID.
8 bits is a bit marker (M). Used at the application level and determined by the profile. If this field is set, then the package data has some special meaning for the application.
The next 6 bits are additional data type codes. This data is not defined in the RTP and SRTP standards. The meaning of these bits is most likely a custom one chosen by Whatsapp.
The last 17 bits indicate the clock source. The number is incremented in order by 1 when the next RTP data packet is sent, this code can be used by the receiver to register packet losses and to restore the true order of the sent fragments. According to the standard, the initial value of the code is random, but this recommendation is not fulfilled by the watsap, because as we can see from the Wireshark data, the initial value of the watsap is always 0.
The next 4 bytes (highlighted in blue) are the packet timestamp.
4 bytes thereafter (green) - SSRC field. It identifies the synchronization source. This identifier is chosen at random so that there are no two equal SSRC codes within one RTP session. All applications must be able to detect when SSRCs are equal. If the sender changes its transport address, it must also change the SSRC identifier.
So, we found out that Whatsapp uses SRTP to protect calls.
This is confirmed by the structure of UDP packets exchanged between Watcap clients.
Also, Watzap uses the TCP protocol to exchange data between the client and the server.
Below we will see how the Noise Pipes Protocol is used to encrypt this part. Binary
analytics
Vatsap client for iOS contains 2 main binaries - WhatsApp application binary and WhatsApp core framework.
In this part, we'll take a closer look at them with the Hopper Disassembler and radare2.
These binaries are encrypted when downloaded from the Appstore.
Here we tricked Apple into jailbreaking an iOS device and gaining access to these files.
Also add that these Whatsapp binaries were decrypted with bfdecrypt.
Next, I'll show you how I gathered information about the protocol fundamentals, algorithms, and open source libraries that Whatsapp uses.
Open source libraries are especially interesting because they can be easily parsed.
libsignal-protocol-c
Watsap uses libsignal-protocol-c - an open source library - which he implemented in the Signal Protocol.
The protocol is based on the Double Ratchet Algorithm, which encrypts watsap messages.
This library was found in the Whatsapp binaries for the following characteristic features:
Watcap also uses libsrtp to implement its Secure Real Time Protocol.
The symbol names have been removed from the Whatsapp binaries, but despite this the binaries contain lines that directly indicate their link to libsrtp:
According to official reports, Watzap uses the Noise Protocol Framework to securely communicate between clients and servers.
The Noise Protocol Framework was designed to create easy-to-use cryptographic protocols using a set of discrete blocks.
But strictly speaking, Watzap only uses the Noise Pipes Protocol, which was taken from the more complete Noise Protocol Framework.
These lines were found in the watzap binaries:
“Noise_XX_25519_AESGCM_SHA256”,
• “Noise_IK_25519_AESGCM_SHA256”,
• “Noise_XXfallback_25519_AESGCM_SHA256”.
These lines contain the handshake patterns implemented in watsap clients.
The first line belongs to the WANoiseFullHandshake class.
The second is to WANoiseResumeHandshake and the last to WANoiseFallbackHandshak.
We will not consider in detail how this protocol works in the framework of this article.
Runtime Analysis
In this part, we will explore the behavior of the watsap client using Frida.
Frida is the so-called Dinamic Instrumentation Toolkit, which is a set of tools that allow you to inject your own code into other applications on the fly.
We will connect to a process in the application and change its behavior using an interactive JS console.
Key Transport
In this part, we will explore the key mechanisms of work of the watcap protocol.
According to the official description from Whatsapp describing the encryption of a VOIP call - the initiator of the call generates a random 32 byte SRTP master secret.
Then the encrypted message is transmitted to side B with the content of this SRTP master secret.
This information is then used for the B-side reconstruction.
I managed to make a fake notification about a missed call from A to B, although the call was actually initiated by Mallory ...
This became possible after rewriting the call-creator and from parameters in the JID on the A side.
Although the Mallory name is shown in the notification.
When Party B starts to respond to such a message, then Party A is called instead of Mallory.
This behavior will be more interesting to analyze later.
Let's summarize the intermediate results - in the watsap, the encrypted master secret is packed into a signal message, which is added to the XMPP strings.
XMPP strings also contain the ID and JID of both sides.
Transferring the master secret to the other party
According to the official description from Watsup clients use the Noise Pipes protocol with Curve25519, AESGCM, and SHA256 from the Noise Protocol Framework.
If you use tracing containing keywords related to the Noise Protocol Framework, you can see that the WANoiseStreamCipher class is used to encrypt calls to Vatsap servers.
The class uses the encryptPlaintext method.
After the call is initiated, the plaintext value is the XMPP message described above.
The message is then encrypted again using the mbed TLS library mbedtls_gcm_crypt_and_tag.
mbedtls_gcm_setkey is 256 bits in size, which means AES-256-GCM is used.
The encryption key is used from the Noise Pipes Protocol, which is not covered in this article.
The encrypted plaintext then goes through TCP to the watsap server (this can be seen with Wireshark).
The server will then forward this message to the called party to initiate the call.
Key Shaping
In this part, we will look at how the Key Shaping Function (KDF) works /
The results were obtained using Frida while tracing the WAHKDF class and libcommonCrypto library.
The WAHKDF class was used to extract keys, salt and one-time codes when initializing SRTP streams.
The deriveSecretsFromInputKeyMaterial method is called 10 times before the call starts: