CWE-173

Name

Improper Handling of Alternate Encoding

Status

Draft

Published

2006-07-19
00h00 +00:00

Modified

2025-12-11
00h00 +00:00

Official links

CWE Mitre.org

Notifications for a CWE

Stay informed of any changes for a specific CWE.

Notifications manage

alertModalTitleNeedConnected

Activate your personalized alerts!

To activate your alerts, you just need to be logged in to your free account. If you’re not logged in yet, choose one of the options below.

Notifications for a CWE

Stay informed of any changes for a specific CWE.

Parameters

You can specify a title that will be retrieved in the alerts that will be sent out.

Specify the CWE ID you wish to monitor.

Planning

Month

Next run calculation

Day

Weekday

Hour

Minute

Creation date

Last execution

Next execution

Name: Improper Handling of Alternate Encoding

The product does not properly handle when an input uses an alternate encoding that is valid for the control sphere to which the input is being sent.

General Informations

Modes Of Introduction

Implementation

Applicable Platforms

Language

Class: Not Language-Specific (Undetermined)

Common Consequences

Scope	Impact	Likelihood
Access Control	Bypass Protection Mechanism

Potential Mitigations

Phases : Architecture and Design
Avoid making decisions based on names of resources (e.g. files) if those resources can have alternate names.
Phases : Implementation

Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does.

When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue."

Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.

Phases : Implementation
Use and specify an output encoding that can be handled by the downstream component that is reading the output. Common encodings include ISO-8859-1, UTF-7, and UTF-8. When an encoding is not specified, a downstream component may choose a different encoding, either by assuming a default encoding or automatically inferring which encoding is being used, which can be erroneous. When the encodings are inconsistent, the downstream component might treat some character or byte sequences as special, even if they are not special in the original encoding. Attackers might then be able to exploit this discrepancy and conduct injection attacks; they even might be able to bypass protection mechanisms that assume the original encoding is also being used by the downstream component.
Phases : Implementation
Inputs should be decoded and canonicalized to the application's current internal representation before being validated (CWE-180). Make sure that the application does not decode the same input twice (CWE-174). Such errors could be used to bypass allowlist validation schemes by introducing dangerous inputs after they have been checked.

Vulnerability Mapping Notes

Justification : This CWE entry is at the Variant level of abstraction, which is a preferred level of abstraction for mapping to the root causes of vulnerabilities.
Comment : Carefully read both the name and description to ensure that this mapping is an appropriate fit. Do not try to 'force' a mapping to a lower-level Base/Variant simply to comply with this preferred level of abstraction.

Related Attack Patterns

CAPEC-ID	Attack Pattern Name
CAPEC-120	Double Encoding The adversary utilizes a repeating of the encoding process for a set of characters (that is, character encoding a character encoding of a character) to obfuscate the payload of a particular request. This may allow the adversary to bypass filters that attempt to detect illegal characters or strings, such as those that might be used in traversal or injection attacks. Filters may be able to catch illegal encoded strings, but may not catch doubly encoded strings. For example, a dot (.), often used in path traversal attacks and therefore often blocked by filters, could be URL encoded as %2E. However, many filters recognize this encoding and would still block the request. In a double encoding, the % in the above URL encoding would be encoded again as %25, resulting in %252E which some filters might not catch, but which could still be interpreted as a dot (.) by interpreters on the target.
CAPEC-267	Leverage Alternate Encoding An adversary leverages the possibility to encode potentially harmful input or content used by applications such that the applications are ineffective at validating this encoding standard.
CAPEC-3	Using Leading 'Ghost' Character Sequences to Bypass Input Filters Some APIs will strip certain leading characters from a string of parameters. An adversary can intentionally introduce leading "ghost" characters (extra characters that don't affect the validity of the request at the API layer) that enable the input to pass the filters and therefore process the adversary's input. This occurs when the targeted API will accept input data in several syntactic forms and interpret it in the equivalent semantic way, while the filter does not take into account the full spectrum of the syntactic forms acceptable to the targeted API.
CAPEC-4	Using Alternative IP Address Encodings This attack relies on the adversary using unexpected formats for representing IP addresses. Networked applications may expect network location information in a specific format, such as fully qualified domains names (FQDNs), URL, IP address, or IP Address ranges. If the location information is not validated against a variety of different possible encodings and formats, the adversary can use an alternate format to bypass application access control.
CAPEC-52	Embedding NULL Bytes An adversary embeds one or more null bytes in input to the target software. This attack relies on the usage of a null-valued byte as a string terminator in many environments. The goal is for certain components of the target software to stop processing the input when it encounters the null byte(s).
CAPEC-53	Postfix, Null Terminate, and Backslash If a string is passed through a filter of some kind, then a terminal NULL may not be valid. Using alternate representation of NULL allows an adversary to embed the NULL mid-string while postfixing the proper data so that the filter is avoided. One example is a filter that looks for a trailing slash character. If a string insertion is possible, but the slash must exist, an alternate encoding of NULL in mid-string may be used.
CAPEC-64	Using Slashes and URL Encoding Combined to Bypass Validation Logic This attack targets the encoding of the URL combined with the encoding of the slash characters. An attacker can take advantage of the multiple ways of encoding a URL and abuse the interpretation of the URL. A URL may contain special character that need special syntax handling in order to be interpreted. Special characters are represented using a percentage character followed by two digits representing the octet code of the original character (%HEX-CODE). For instance US-ASCII space character would be represented with %20. This is often referred as escaped ending or percent-encoding. Since the server decodes the URL from the requests, it may restrict the access to some URL paths by validating and filtering out the URL requests it received. An attacker will try to craft an URL with a sequence of special characters which once interpreted by the server will be equivalent to a forbidden URL. It can be difficult to protect against this attack since the URL can contain other format of encoding such as UTF-8 encoding, Unicode-encoding, etc.
CAPEC-71	Using Unicode Encoding to Bypass Validation Logic An attacker may provide a Unicode string to a system component that is not Unicode aware and use that to circumvent the filter or cause the classifying mechanism to fail to properly understanding the request. That may allow the attacker to slip malicious data past the content filter and/or possibly cause the application to route the request incorrectly.
CAPEC-72	URL Encoding This attack targets the encoding of the URL. An adversary can take advantage of the multiple way of encoding an URL and abuse the interpretation of the URL.
CAPEC-78	Using Escaped Slashes in Alternate Encoding This attack targets the use of the backslash in alternate encoding. An adversary can provide a backslash as a leading character and causes a parser to believe that the next character is special. This is called an escape. By using that trick, the adversary tries to exploit alternate ways to encode the same character which leads to filter problems and opens avenues to attack.
CAPEC-79	Using Slashes in Alternate Encoding This attack targets the encoding of the Slash characters. An adversary would try to exploit common filtering problems related to the use of the slashes characters to gain access to resources on the target host. Directory-driven systems, such as file systems and databases, typically use the slash character to indicate traversal between directories or other container components. For murky historical reasons, PCs (and, as a result, Microsoft OSs) choose to use a backslash, whereas the UNIX world typically makes use of the forward slash. The schizophrenic result is that many MS-based systems are required to understand both forms of the slash. This gives the adversary many opportunities to discover and abuse a number of common filtering problems. The goal of this pattern is to discover server software that only applies filters to one version, but not the other.
CAPEC-80	Using UTF-8 Encoding to Bypass Validation Logic This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.

Submission

Name	Organization	Date	Date release	Version
PLOVER		2006-07-19 +00:00	2006-07-19 +00:00	Draft 3

Modifications

Name	Organization	Date	Comment
Eric Dalci	Cigital	2008-07-01 +00:00	updated Potential_Mitigations, Time_of_Introduction
CWE Content Team	MITRE	2008-09-08 +00:00	updated Relationships, Taxonomy_Mappings
CWE Content Team	MITRE	2009-07-27 +00:00	updated Potential_Mitigations
CWE Content Team	MITRE	2010-12-13 +00:00	updated Name
CWE Content Team	MITRE	2011-03-29 +00:00	updated Potential_Mitigations
CWE Content Team	MITRE	2011-06-01 +00:00	updated Common_Consequences
CWE Content Team	MITRE	2012-05-11 +00:00	updated Related_Attack_Patterns, Relationships
CWE Content Team	MITRE	2012-10-30 +00:00	updated Potential_Mitigations
CWE Content Team	MITRE	2014-07-30 +00:00	updated Relationships
CWE Content Team	MITRE	2017-11-08 +00:00	updated Applicable_Platforms
CWE Content Team	MITRE	2019-01-03 +00:00	updated Related_Attack_Patterns
CWE Content Team	MITRE	2019-06-20 +00:00	updated Related_Attack_Patterns
CWE Content Team	MITRE	2020-02-24 +00:00	updated Potential_Mitigations, Relationships
CWE Content Team	MITRE	2020-06-25 +00:00	updated Potential_Mitigations
CWE Content Team	MITRE	2023-01-31 +00:00	updated Description
CWE Content Team	MITRE	2023-04-27 +00:00	updated Relationships
CWE Content Team	MITRE	2023-06-29 +00:00	updated Mapping_Notes
CWE Content Team	MITRE	2025-12-11 +00:00	updated Weakness_Ordinalities

CWE-173 Detail