Double encoding

Double encoding is the act of encoding data twice in a row using the same encoding scheme. It is usually used as an attack technique against applications to bypass authorization schemes or security countermeasures that filter, blacklist or sanitize user input. In double encoding attacks against security countermeasures, characters of the payload that are considered to be illegal by those countermeasures are replaced with their double-encoded form.

Double URI-encoding is a special type of double encoding in which data is URI-encoded twice in a row. Double URI-encoding attacks have been used to bypass authorization schemes and security countermeasures against code injection, directory traversal, Cross-site scripting (XSS) and SQL injection.

Description

In double encoding, data is encoded twice in a row using the same encoding scheme, that is, double-encoded form of data X is Encode(Encode(X)) where Encode is an encoding function.^[1]

Double encoding is usually used as an attack technique against applications to bypass authorization schemes or security countermeasures that filter, blacklist or sanitize user input.^[2] In double encoding attacks against security countermeasures, characters of the payload that are considered to be illegal by the target countermeasures are replaced with their double-encoded form.^[3] If double-encoded payload is only decoded once before some security countermeasures, single-encoded payload will be observed by those countermeasures, which, due to one level of encoding still present, may not be considered illegal by those countermeasures and pass through them, but later on, the target system might perform another round of decoding, resulting in attacker-intended payload which might be used by the target system, effectively leading to bypass of those countermeasures.^[4]

Double URI-encoding

Double URI-encoding, also referred to as double percent-encoding, is a special type of double encoding in which data is URI-encoded twice in a row.^[5] In other words, double-URI-encoded form of data X is URI-encode(URI-encode(X)).^[5] For example for calculating double-URI-encoded form of <, first < is URI-encoded as %3C which then in turn is URI-encoded as %253C, that is, double-URI-encode(<) = URI-encode(URI-encode(<)) = URI-encode(%3C) = %253C.^[6] As another example, for calculating double-URI-encoded form of ../, first ../ is URI-encoded as %2E%2E%2F which then in turn is URI-encoded as %252E%252E%252F, that is, double-URI-encode(../) = URI-encode(URI-encode(../)) = URI-encode(%2E%2E%2F) = %252E%252E%252F.^[7]

Double-URI-encoding is usually used as an attack technique against web applications and web browsers to bypass authorization schemes and security countermeasures that filter, blacklist or sanitize user input.^[8]^[9] For example because . and its URI-encoded form %2E are used in some directory traversal attacks, they are usually considered to be illegal by security countermeasures.^[10] However it is more probable for %252E, which is the double-URI-encoded form of ., to not to be considered illegal by security countermeasures and hence pass through them, but later on, when the target system is building the path related to the directory traversal attack it might use the double-URI-decoded form of %252E, which is ., something that the countermeasures would have been considered illegal.^[11]

Double URI-encoding attacks have been used to bypass authorization schemes and security countermeasures against code injection, directory traversal, XSS and SQL injection.^[12]

Prevention

Decoding some user input twice using the same decoding scheme, once before and once after an authorization scheme or a security countermeasure, may allow double encoding attacks to bypass the authorization scheme or the countermeasure that occur in between of the two decoding operation.^[13] Thus, to prevent double encoding attacks, all decoding operations on user input should occur before authorization schemes and security countermeasures that filter, blacklist, or sanitize user input.^[14]

Examples

PHP

In PHP programming language, $_GET and $_REQUEST are already URI-decoded and thus programmers should avoid calling urldecode function on data that has been read from them.^[15] Calling urldecode function on data read from $_GET or $_REQUEST causes the data to be URI-decoded once more and hence may open possibility for double encoding attacks.

Directory traversal

In the following PHP program, the value of $_GET["name"] is used to build the path of the file to be sent to the user. This opens the possibility for directory traversal attacks that incorporate their payload into HTTP GET parameter name. As a countermeasure against directory traversal attacks, this program searches the value it reads from $_GET["name"] for directory traversal sequences and exits if it finds one. However, after this countermeasure, the program URI-decodes data read from $_GET["name"], which makes it vulnerable to double encoding attacks.

<?php
/* Note that $_GET is already URL-decoded */
$path = $_GET["file"];

/* Countermeasure */
/* Exit if user input contains directory traversal sequence */
if (strstr($path, "../") or strstr($path,  "..\\"))
{
    exit("Directory traversal attempt detected.");
}

/* URI-decode user input once again */
$path = urldecode($path);

/* Build file path to be sent using user input */
echo file_get_contents("uploads/" . $path);

This countermeasure prevents payloads such as ../../../../etc/passwd and its URI-encoded form %2E%2E%2F%2E%2E%2F%2E%2E%2F%2E%2E%2Fetc%2Fpasswd. However, %252E%252E%252F%252E%252E%252F%252E%252E%252F%252E%252E%252Fetc%252Fpasswd, which is the double-URI-encoded form of ../../../../etc/passwd, will bypass this countermeasure. When double-URI-encoded payload %252E%252E%252F%252E%252E%252F%252E%252E%252F%252E%252E%252Fetc%252Fpasswd is used, the value of $_GET["file"] will be %2E%2E%2F%2E%2E%2F%2E%2E%2F%2E%2E%2Fetc%2Fpasswd which doesn't contain any directory traversal sequence and thus passes the countermeasure and will be passed to urldecode function which returns ../../../../etc/passwd, resulting in a successful attack.

XSS

In the following PHP program, the value of $_GET["name"] is used to build a message to be shown to the user. This opens the possibility for XSS attacks that incorporate their payload into HTTP GET parameter name. As a countermeasure against XSS attacks, this program sanitizes the value it reads from $_GET["name"] via htmlentities function. However, after this countermeasure, the program URI-decodes data read from $_GET["name"], which makes it vulnerable to double encoding attacks.

<?php
/* Note that $_GET is already URL-decoded */
$name = $_GET["name"];

/* Countermeasure */
/* Sanitize user input via htmlentity */
$name = htmlentities($name);

/* URI-decode user input once again */
$name = urldecode($name);

/* Builds message to be shown using user input */
echo "Hello " . $name;

This countermeasure prevents payloads such as <script>alert(1)</script> and its URI-encoded form %3Cscript%3Ealert%281%29%3C%2Fscript%3E. However, %253Cscript%253Ealert%25281%2529%253C%252Fscript%253E, which is the double-URI-encoded form of <script>alert(1)</script>, will bypass this countermeasure. When double-URI-encoded payload %253Cscript%253Ealert%25281%2529%253C%252Fscript%253E is used, the value of $_GET["name"] will be %3Cscript%3Ealert%281%29%3C%2Fscript%3E which remains unaffected by htmlentities function and thus will be passed to urldecode function which returns <script>alert(1)</script>, resulting in a successful attack.

References

^ "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. The adversary utilizes a repeating of the encoding process for a set of characters (that is, character encoding a character encoding of a character) to obfuscate the payload of a particular request.
^ "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks [...] For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc.
^ "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. [...] Try double-encoding for parts of the input in order to try to get past the filters.
^ "Double Encoding". owasp.org. Retrieved 24 July 2022. By using double encoding it's possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don't have the corresponding security checks in place.
^ ^a ^b Prasad, Prakhar (2016). Mastering Modern Web Penetration Testing. Packt Publishing. p. 11. ISBN 978-1785284588. Double percent encoding is the same as percent encoding with a twist that each character is encoded twice instead of once.
^ Prasad, Prakhar (2016). Mastering Modern Web Penetration Testing. Packt Publishing. p. 11. ISBN 978-1785284588. So if I had to encode < using double encoding, I'll first encode it into its percent-encoded format, which is %3c and then again percent encode the % character. The result of this will be %253c.
^ "Double Encoding". owasp.org. Retrieved 23 July 2022. For example, ../ (dot-dot-slash) characters represent %2E%2E%2F in hexadecimal representation. When the % symbol is encoded again, its representation in hexadecimal code is %25. The result from the double encoding process ../ (dot-dot-slash) would be %252E%252E%252F
^ Prasad, Prakhar (2016). Mastering Modern Web Penetration Testing. Packt Publishing. p. 11. ISBN 978-1785284588. This technique[double percent encoding] comes in pretty handy when attempting to evade filters which attempt to blacklist certain encoded characters
^ "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected ages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc.
^ "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. For example, a dot (.), often used in path traversal attacks and therefore often blocked by filters, could be URL encoded as %2E. However, many filters recognize this encoding and would still block the request.
^ "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. In a double encoding, the % in the above URL encoding would be encoded again as %25, resulting in %252E which some filters might not catch, but which could still be interpreted as a dot (.) by interpreters on the target.
^ "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Observed Examples. Retrieved 24 July 2022.
^ "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Retrieved 24 July 2022. The software decodes the same input twice, which can limit the effectiveness of any protection mechanism that occurs in between the decoding operations.
^ "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Retrieved 24 July 2022. Inputs should be decoded and canonicalized to the application's current internal representation before being validated (CWE-180).
^ "urldecode". PHP. Retrieved 23 July 2022.

External links

"Double Encoding". owasp.org. Retrieved 23 July 2022.
"CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022.
"CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Retrieved 24 July 2022.

[DoubleEncoding-1] "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. The adversary utilizes a repeating of the encoding process for a set of characters (that is, character encoding a character encoding of a character) to obfuscate the payload of a particular request.

[DoubleEncodingPurpose1-2] "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks [...] For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected pages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc.

[DoubleEncodingAttackMethod-3] "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. This[double encoding] may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. [...] Try double-encoding for parts of the input in order to try to get past the filters.

[4] "Double Encoding". owasp.org. Retrieved 24 July 2022. By using double encoding it's possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don't have the corresponding security checks in place.

[DoubleURIEncoding-5] Prasad, Prakhar (2016). Mastering Modern Web Penetration Testing. Packt Publishing. p. 11. ISBN 978-1785284588. Double percent encoding is the same as percent encoding with a twist that each character is encoded twice instead of once.

[6] Prasad, Prakhar (2016). Mastering Modern Web Penetration Testing. Packt Publishing. p. 11. ISBN 978-1785284588. So if I had to encode < using double encoding, I'll first encode it into its percent-encoded format, which is %3c and then again percent encode the % character. The result of this will be %253c.

[7] "Double Encoding". owasp.org. Retrieved 23 July 2022. For example, ../ (dot-dot-slash) characters represent %2E%2E%2F in hexadecimal representation. When the % symbol is encoded again, its representation in hexadecimal code is %25. The result from the double encoding process ../ (dot-dot-slash) would be %252E%252E%252F

[8] Prasad, Prakhar (2016). Mastering Modern Web Penetration Testing. Packt Publishing. p. 11. ISBN 978-1785284588. This technique[double percent encoding] comes in pretty handy when attempting to evade filters which attempt to blacklist certain encoded characters

[9] "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. For instance, by double encoding certain characters in the URL (e.g. dots and slashes) an adversary may try to get access to restricted resources on the web server or force browse to protected ages (thus subverting the authorization service). An adversary can also attempt other injection style attacks using this attack pattern: command injection, SQL injection, etc.

[10] "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. For example, a dot (.), often used in path traversal attacks and therefore often blocked by filters, could be URL encoded as %2E. However, many filters recognize this encoding and would still block the request.

[11] "CAPEC-120: Double Encoding". capec.mitre.org. Retrieved 23 July 2022. In a double encoding, the % in the above URL encoding would be encoded again as %25, resulting in %252E which some filters might not catch, but which could still be interpreted as a dot (.) by interpreters on the target.

[DoubleEncodingAttackUsage-12] "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Observed Examples. Retrieved 24 July 2022.

[13] "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Retrieved 24 July 2022. The software decodes the same input twice, which can limit the effectiveness of any protection mechanism that occurs in between the decoding operations.

[14] "CWE-174: Double Decoding of the Same Data". cwe.mitre.org. Retrieved 24 July 2022. Inputs should be decoded and canonicalized to the application's current internal representation before being validated (CWE-180).

[15] "urldecode". PHP. Retrieved 23 July 2022.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]