The 0-RTT Feature of TLS 1.3 Can Be Used As an Encrypted Steganographic Channel to Operate a Backdoor into an Enterprise Network

The TLS 1.3 specification in RFC 8446 allows the client to send application data to the server immediately after the ClientHello message, with zero round-trip time, and refers to that data as 0-RTT data or early data.

A server that receives early data may accept it or reject it. Rejected data is ignored by the server but seen by all routers, switches, firewalls and other network appliances in the network path from the client to the server. Therefore an attacker-controlled client can use rejected early data as a steganographic channel to communicate with any compromised network appliance situated in the network path. Furthermore neither the server, nor any of the TLS visibility solutions that are currently in the market among those that I surveyed in an earlier post, attempt to decrypt rejected early data. Hence the attacker-controlled client can encrypt the channel using a key unknown to the server but shared with the compromised appliance without risking detection.

An attacker who has implanted persistent malware on an enterprise network appliance can therefore use rejected early data as an encrypted steganographic channel to send command-and-control (C2) instructions from an external client to the implant in the compromised appliance and thus operate a backdoor into the enterprise network.

In this post I go over some of the details of the 0-RTT feature of TLS 1.3, describe several methods that an attacker-controlled client can use to cause rejection of early data by the server, sketch out an attack scenario and propose mitigations.

Details of the 0-RTT Feature

Early data is protected using early traffic keys derived from a pre-shared key (PSK) and an AEAD algorithm specified by a cipher suite associated with the PSK. (The early traffic keys consist of an AEAD key and an AEAD IV from which per-record AEAD nonces are derived.) The client and the server may share multiple PSKs, each identified by a “PSK identity”. When it wants to use a PSK, the client includes a pre_shared_key extension in the ClientHello message, offering a list of PSK identities. When it wants to send early data, the client also includes in the message an early_data extension announcing that early data will follow. The client derives the early traffic keys from the PSK whose identity is listed first in the pre_shared_key extension, which I will call the primary PSK.

On seeing the client's early_data extension, the server must behave in one of three ways, listed on page 53 of RFC 8446.

It may reject the early data by discarding it and otherwise acting as if the client had not included an early_data extension in the ClientHello message. The client does send the early data, which travels to the server over the network path, followed by handshake messages. The early data is encrypted under the early traffic keys using the AEAD algorithm specified by the cipher suite associated with the primary PSK, while the handshake messages that follow the early data are encrypted under the client's handshake traffic keys, which are different from the early traffic keys with overwhelming probability, using an AEAD algorithm chosen by the server, which may be different from the one associated with the primary PSK if the server doe not accept the early data. The server “skips past early data by attempting to deprotect received records using the handshake traffic key, discarding records which fail deprotection”.
Or it may reject the early data by sending a HelloRetryRequest message asking the server to send a second ClientHello message. Again the early data travels to the server, this time followed by the second ClientHello message. The TLS records that carry the second ClientHello message are in the clear, with content type “handshake”, while those that carry the early data before them are encrypted. The server “ignores early data by skipping all records with an external content type of "application_data" (indicating that they are encrypted)”.
Or it may accept the early data.

The server is sometimes required to terminate or, synonymously, close the TLS connection with a particular kind of Error Alert when it detects a particular kind of error. If the error is related to early data, the server may close the connection as the early data is being transmitted or is about to be transmitted, and this could be viewed as an additional way of rejecting the early data, a kind of hard reject as opposed the two kinds of soft reject above. However RFC 8446 consistently uses the term “reject” to refer to one of the above soft rejects, and I will do the same here. So when I talk below about the server rejecting early data it should be understood that the TLS connection and a fortiori the underlying TCP connection remain open and the flow of early data is not suppressed or interrupted.

The main use case for early data is session resumption. The server can send a NewSessionTicket message in the course of a TLS connection, viewed as the initial connection of a “session”, and thereby establish a PSK with the client to be used in a subsequent connection, viewed as a resumption of the session. If the NewSessionTicket message includes an early_data extension, the client is allowed to use the PSK as the primary PSK in the subsequent connection and send early data encrypted under the PSK. The early_data extension contains a max_early_data_size value that limits the number of bytes of early data that the client is allowed to send. Section 4.6.1 of RFC 8446 stipulates that “a server receiving more than max_early_data_size bytes of 0-RTT data SHOULD terminate the connection with an "unexpected_message" alert”. The RFC does not say what the server should do if the client encrypts early data under a PSK established by a NewSessionTicket that does not have an early_data extension. It would make sense to view the absence of early data as equivalent to a max_early_data_size of zero bytes, and also terminate the connection in that case with an unexpected_message alert.

The NewSessionTicket message carries a “ticket”, created by the server and opaque to the client, to be used as the PSK identity. The PSK itself is derived from the resumption master secret and a ticket_nonce also contained in the message. Because the ticket is opaque to the client, its construction is not subject to interoperability requirements and is not specified by RFC 8446, although options are suggested in Section 8. The ticket could be a randomly generated lookup key into a database where the server stores the PSK, or it could be a “self-contained”, “self-encrypted” and “self-authenticated” data structure with an encrypted portion comprising the PSK and additional encrypted information, and a plaintext portion comprising a randomly generated reference to ticket protection keys used to encrypt and authenticate the ticket. RFC 5077, obsoleted by RFC 8446, discusses a recommended ticket construction for earlier versions of TLS, where the same reference and ticket protection keys are used for multiple tickets.

The NewSessionTicket message also has a ticket_lifetime value, and Section 4.2.11.1 stipulates that “Clients MUST NOT attempt to use tickets which have ages greater than the "ticket_lifetime" value which was provided with the ticket”. This is not enforced by the server, but allows the server to forget the ticket after it has expired, either by deleting the PSK from a PSK database and thus causing the ticket to not be found among the lookup keys, or by deleting the ticket protection keys after they have not been used to encrypt any recent tickets, causing the reference to not be found among the references to protection keys. In either case, the ticket and the PSK are said to become “unknown to the server”.

Causing Rejection of Early Data

An attacker who wishes to use rejected early data as a C2 channel to a compromised network appliance can use a variety of methods to cause the server to reject the early data instead of closing the connection, including the following.

Bogus Unknown Ticket

In Section 4.2.10, which discusses early data, RFC 8446 does not say what the server should do when the primary PSK is unknown and the Client sends early data. The PSK could have been established by a NewSessionTicket that did not have an early_data extension; this suggests that the connection should be closed with an “unexpected_message” alert. But there are multiple hints in other sections of RFC 8446, and in RFC 8470 (concerned with the use of early data in TLS 1.3 connections that carry HTTP requests) suggesting that the server should reject the early data instead of closing the connection. For example, Section 8.1 of RFC 8446, concerned with anti-replay mitigation, mentions that “if an unknown ticket is provided, the server would then fall back to a full handshake”, i.e. would reject the early data.

Therefore an attacker can cause a server that behaves as suggested to reject early data by using a bogus unknown ticket, pretending to be either a randomly generated PSK database look-up key, or a self-encrypted ticket with a randomly generated reference to ticket protection keys.

Missing Cipher Suite

Each PSK is associated with a cipher suite, which, in the session resumption case, is the one in effect in the earlier connection when the NewSessionTicket message was sent. However, the client separately offers a list of cipher suites and a list of PSKs in the ClientHello message, and the server independently selects a cipher suite and a PSK from the lists in the ServerHello message.

In TLS 1.3, a cipher suite specifies an AEAD algorithm and a hash function used for the derivation of traffic keys as specified in Section 7.1 and Section 7.3 of the RFC.

Section 4.2.11, concerned with the pre_shared_key extension, requires the server to select a cipher suite and a PSK that are “compatible”, in the sense that the cipher suite specifies the same hash function as the cipher suite associated with the PSK, but not necessarily the same AEAD algorithm. The “Implementor's note” suggests that the server might want to select the cipher suite first, then choose a PSK among those compatible with the selected cipher suite. Then it adds that “If no acceptable PSKs are found, the server SHOULD perform a non-PSK handshake if possible”.

Section 4.2.10, concerned with the early_data extension, further stipulates that in order to accept early data the server must select in the ServerHello message the primary PSK and the cipher suite associated with it, not just any compatible cipher suite, and notes that “These requirements are a superset of those [specified in Section 4.2.11] needed to perform a 1-RTT handshake using the PSK in question”. Section 4.2.10 does not explicitly say what the server should do if it cannot fulfill the requirement because the client has not offered the associated cipher suite. But, for consistency with the recommendation of Section 4.2.11, it seems clear that it should reject the early data instead of closing the connection, and either select any valid PSK and compatible cipher suite combination, or perform a non-PSK handshake.

Thus an attacker can cause the server to reject early data by using a known PSK as the primary PSK but not offering the associated cipher suite.

ClientHello Replay Check

Early data is not protected against replay attacks, and Section 8 proposes several mitigations. Two of them are required and can be leveraged to cause the server to reject early data. One of the required anti-replay mitigations is checking for a ClientHello replay.

The preamble of Section 8 requires rejection of replays to the same server instance within a server system, and suggests that a server can accomplish that by “locally recording data from recently received ClientHellos and rejecting repeats”. This suggestion relies on the fact that replaying early data requires replaying the preceding ClientHello message because a transcript of the ClientHello message is used in the derivation of the early traffic keys.

An attacker-controlled client can cause a server that follows the suggestion to reject malicious early data by resuming a session with a ClientHello message followed by benign early data, then, while the TCP connection used for the resumption is still open, immediately sending over the same connection a replay of the ClientHello message followed by the malicious early data. The same server will receive the resumption and the replay, and will reject the malicious early data without looking at it based on the fact that the ClientHello message is a replay.

ClientHello Freshness Check

The other required anti-replay mitigation is the “Freshness Check” of Section 8.3 The requirement is stated as follows in Section 4.2.10:

For PSKs provisioned via NewSessionTicket, a server MUST validate that the ticket age for the selected PSK identity (computed by subtracting ticket_age_add from PskIdentity.obfuscated_ticket_age modulo 2^32) is within a small tolerance of the time since the ticket was issued (see Section 8). If it is not, the server SHOULD proceed with the handshake but reject 0-RTT, and SHOULD NOT take any other action that assumes that this ClientHello is fresh.

Based on the less confusing explanation in Section 8.3, this means that the server must check that the age of the ticket as reported by the client in the ClientHello message is within a small tolerance of the age of the ticket as computed by the server. The server computes its view of the age of the ticket by subtracting a creation timestamp from the current wall clock time. If the ticket is self-contained the timestamp must be included in the ticket as part of the additional encrypted information. The client computes its view of the age of the ticket by subtracting the time when it received the ticket from wall clock time when it sends the ClientHello message. It obfuscates its view of the age by adding to it a ticket_age_add value included in the NewSessionTicket message, and sends the resulting obfuscated_ticket_age in the ClientHello message, coupled with the PSK identity. If the ticket is self-contained, the ticket_age_add must also be included in the ticket as part of the additional encrypted information so the server can subtract it from the obfuscated_ticket_age.

If the reported age is substantially less than the computed age, the ClientHello message and subsequent early data must be a replay of a ClientHello message and early data sent earlier, when the computed age would have been close to the reported age. A replay attacker who wants the data to be accepted cannot defeat this check by modifying the obfuscated_ticket_age in the ClientHello message because, as noted above in connection with the ClientHello Replay Check, a transcript of the ClientHello message is used in the derivation of the early traffic keys, and hence the modification would prevent the server from decrypting the early data.

But while a replay attacker cannot modify the ClientHello message, a backdoor attacker can, and may report a ticket age that is substantially less than the current age of the ticket computed by the server. For example, the attacker can pretend that the ClientHello message was created shortly after the ticket was received, by reporting an obfuscated_ticket_age slightly greater than ticket_age_add, and send the message substantially later than the ticket was received but before it expires. The discrepancy between the reported age and the computed age will then cause the server to reject the early data.

Choice of SNI

Section 3 of RFC 8470, the RFC concerned with the use of early data in HTTP, states that one technique that the server can use to mitigate the risk of replay is to reject early data at the TLS layer. Then it adds that “A server cannot selectively reject early data, so this results in all requests sent in early data being discarded”.

This, however, does not seem to be true. There is no requirement in RFC 8446 that a server should either accept all early data or reject all early data. And a server may be able to use other information besides the early data itself to decide whether a request is idempotent or not. In particular, a Server Name Indication (SNI) may reveal whether the named server being addressed is one that hosts static data or dynamic data, with requests for static data being idempotent and requests for dynamic data being non-idempotent. The server can then selectively reject early data in requests addressed to named servers that host dynamic data.

A named server that handles non-idempotent requests may have a policy of not sending tickets that allow early data. A client that sends a request addressing such a server will not get such a ticket, and on a subsequent request to the same named server using session resumption, it will not send early data. However, Section 4.6.1 of RFC 8446 allows session resumption with a different SNI. Therefore an attacker-controlled client may obtain a ticket that allows early data from a named server that handles idempotent requests and use it to send early data to a named server that handles non-idempotent requests without raising suspicious. The latter named server will then reject the early data.

Attack Scenario

To create a backdoor into an enterprise network an attacker may begin by implanting malware in one or more network appliances. A variety of methods have been used by attackers to do that, ranging from exploiting unpatched or zero-day vulnerabilities in the software or firmware of an appliance to attacking the hardware or software supply chain, or even intercepting the shipment of the appliance and implanting firmware into it before it is delivered. Vulnerabilities in network appliances are very common, as can be seen by the hundreds of vulnerabilities that have been found in the firmware of one of the vendors.

Having compromised one or more network appliances, the attacker may perform network reconnaissance to find one or more servers reachable from the internet over a network path that includes a compromised appliance. If no such server and compromised appliance can be found, the attacker may try to perform lateral movement to compromise a better positioned network appliance.

If multiple servers are found, the attacker may try to choose one that enables session resumption, preferably by sending NewSessionTicket messages with tickets that allow early data. To find out whether a server sends NewSessionTicket messages, the attacker may modify a commonly used open-source browser such as Firefox to provide the attacker with interactive control over the behavior of the TLS engine. If a server is found that sends NewSessionTicket messages, then the attacker can use any of the methods described above to send C2 instructions to an implant as early data that the server rejects.

If the only server or servers found do not send NewSessionTicket messages, then the attacker can use a bogus unknown ticket as the identity of the primary PSK in the ClientHello message before sending the early data. Offering a PSK to a server that does not use PSKs should be highly suspicious, but Section 4.2.11 of RFC 8446 recommends that “any unknown PSKs (e.g., ones not in the PSK database or encrypted with an unknown key) SHOULD simply be ignored”, so a server that behaves as recommended will not detect the attack; and an intrusion detection system that is not aware of the fact that rejected early data can be dangerous will not raise a red flag.

Having found a server, the attacker may want to verify that it can communicate with a compromised appliance. To do that it may use the modified open-source browser to send early data with a payload that instructs any implant on the network path to send a beacon to the attacker over the internet. However sending the beacon entails a risk of detection by an intrusion detection system, so the attacker may want to omit this step and trust that the malware has been successfully implanted and the network reconnaissance step has correctly determined that there is a compromised appliance on the network path to the server.

After finding a suitable server and compromised appliance, the attacker can operate the implant in the appliance as a backdoor into the enterprise network, sending early data payloads to customize the implant if needed, to update it when needed, or to launch attacks.

Mitigations

In this section I will focus for simplicity on the session resumption use case, where PSK identities are tickets sent by NewSessionTicket messages, and I will refer to the identity of the primary PSK as the primary ticket.

The first mitigation that comes to mind against the use of rejected early data as a C2 conduit to a compromised appliance, besides not using TLS 1.3, is to not use early data. This, however, is easier said than done.

The server can prohibit early data by never including an early_data extension in a NewSessionTicket message. This allows the server to detect an attack when it receives a ClientHello message when an early_data extension, and attempt to prevent the attacker-controlled client from sending early data by closing the connection. The server could close not only the TLS connection but also the underlying TCP connection, disregarding the statement in Section 6.1: “No part of this standard should be taken to dictate the manner in which a usage profile for TLS manages its data transport, including when connections are opened or closed.”

This mitigation, however, may not be fully effective. If the server closes the TCP connection by sending a FIN segment, the attacker-controlled client may leave the connection half open and proceed to send early data. If the server sends an RST segment the client may ignore it and send early data that will not be taken up by the server but will be seen by the compromised appliance.

Another not fully effective mitigation is for the server to always accept early data, but not process it until the client has sent its Finished message. This is a dual-purpose mitigation, which is proposed in Section 3 of RFC 8470 for protection against replay attacks. It protects against replay because the client's Finished message depends on a transcript hash of messages that include the ServerHello message, and therefore cannot be replayed; however, the early data is no longer 0-RTT from the point of view of the server's application layer, which does not receive it from the TLS layer until after one round trip.

While this dual-purpose mitigation is effective against replay, it is not fully effective against the use of rejected early data encrypted under an unknown key by an attacker-controlled client.

It does allow the server to detect the attack when it is not able to decrypt the early data. Furthermore, Section 4.2.10 of the RFC requires that, “if the server fails to decrypt a 0-RTT record following an accepted "early_data" extension, it MUST terminate the connection with a "bad_record_mac" alert as per Section 5.2”. Any compliant server will do this, without having to be modified to implement the mitigation. However a compliant server will not close the underlying TCP connection, and the attacker-controlled client will be free to ignore the bad_record_mac alert and continue to send early data that the implant will receive and act upon.

A fully effective defense against the use of early data to access a backdoor from an external client requires the TCP connection to be closed by the gateway at the boundary of the enterprise network if there is a chance that the early data may be malicious.

A server that prohibits early data by never sending a NewSessionTicket with an early_data extension knows as soon as it receives a ClientHello message with an early_data extension that the early data that follows is going to be malicious. It can then notify an intrusion prevention system, which can ask the gateway to close the TCP connection.

A server that does not forget tickets until well past their lifetime should not receive a ClientHello message with an unknown primary ticket from a client that conforms to the above-mentioned requirement in Section 4.2.11.1 of the RFC to not use expired tickets. When presented with an unknown primary ticket it can therefore ask the intrusion prevention system to close the TCP connection at the gateway.

When a ClientHello message is received with an early_data extension and a valid primary ticket, it should be possible to decrypt the early data that follows using the AEAD algorithm specified by the cipher suite associated with the ticket and the early traffic keys derived from the ticket. If the early data cannot be decrypted, or parts of it cannot be decrypted, it must be malicious and the TCP connection can be closed at the gateway. If the primary ticket allows early data and the early data can be decrypted, there is no reason to think that the early data may be malicious; however, it should be inspected to look for any indications of compromise even if it is rejected.

Clearly, the TLS server should not have to decrypt and/or inspect early data that it intends to reject. It should instead offload those tasks to an intrusion detection system, which could in turn invoke an intrusion prevention system if the early data is deemed malicious; the intrusion prevention system could then ask the boundary gateway to close the TCP connection.

The best way to do this is to combine protection against malicious early data with a TLS visibility solution such as the ST variant or the 2V variant of the Pomcor visibility solution, where a passive visibility middlebox is tasked with decrypting TLS traffic. As described in earlier posts, in ST and 2V the visibility middlebox already decrypts honest early data whether accepted or rejected, and can pass the plaintext to an intrusion detection system that can alert the intrusion prevention system if it finds indications of compromise.

In both the ST variant and the 2V variant the server transmits to the visibility middlebox the decryption tools it needs to decrypt TLS traffic: either the traffic secrets in the case of ST, or the protection states in the case of 2V. The following additional functionality can allow the middlebox to protect against malicious early data in addition to providing traffic visibility:

If the ClientHello message has an early data extension but the primary ticket is unknown, the server tells the middlebox that no decryption tools are available for decrypting the early data because the primary ticket is unknown. The middlebox then alerts the intrusion prevention system, which causes the boundary gateway to close the TCP connection.
If the decryption tools for the early data are available, but the middlebox is not able to decrypt the early data in its entirety, the middlebox alerts the intrusion prevention system, in addition to passing the non-decrypted or partially decrypted early data to an intrusion detection system. The intrusion prevention system then causes the boundary gateway to close the TCP connection as in the previous case.