CAPEC-71

Using Unicode Encoding to Bypass Validation Logic
Medium
High
Draft
2014-06-23
00h00 +00:00
2022-09-29
00h00 +00:00
Alerte pour un CAPEC
Stay informed of any changes for a specific CAPEC.
Notifications manage

Descriptions CAPEC

An attacker may provide a Unicode string to a system component that is not Unicode aware and use that to circumvent the filter or cause the classifying mechanism to fail to properly understanding the request. That may allow the attacker to slip malicious data past the content filter and/or possibly cause the application to route the request incorrectly.

Informations CAPEC

Execution Flow

1) Explore

[Survey the application for user-controllable inputs] Using a browser or an automated tool, an attacker follows all public links and actions on a web site. They record all the links, the forms, the resources accessed and all other potential entry-points for the web application.

Technique
  • Use a spidering tool to follow and record all links and analyze the web pages to find entry points. Make special note of any links that include parameters in the URL.
  • Use a proxy tool to record all user input entry points visited during a manual traversal of the web application.
  • Use a browser to manually explore the website and analyze how it is constructed. Many browsers' plugins are available to facilitate the analysis or automate the discovery.
2) Experiment

[Probe entry points to locate vulnerabilities] The attacker uses the entry points gathered in the "Explore" phase as a target list and injects various Unicode encoded payloads to determine if an entry point actually represents a vulnerability with insufficient validation logic and to characterize the extent to which the vulnerability can be exploited.

Technique
  • Try to use Unicode encoding of content in Scripts in order to bypass validation routines.
  • Try to use Unicode encoding of content in HTML in order to bypass validation routines.
  • Try to use Unicode encoding of content in CSS in order to bypass validation routines.

Prerequisites

Filtering is performed on data that has not be properly canonicalized.

Skills Required

An attacker needs to understand Unicode encodings and have an idea (or be able to find out) what system components may not be Unicode aware.

Mitigations

Ensure that the system is Unicode aware and can properly process Unicode data. Do not make an assumption that data will be in ASCII.
Ensure that filtering or input validation is applied to canonical data.
Assume all input is malicious. Create an allowlist that defines all valid input to the software system based on the requirements specifications. Input that does not match against the allowlist should not be permitted to enter into the system.

Related Weaknesses

CWE-ID Weakness Name

CWE-176

Improper Handling of Unicode Encoding
The product does not properly handle when an input contains Unicode encoding.

CWE-179

Incorrect Behavior Order: Early Validation
The product validates input before applying protection mechanisms that modify the input, which could allow an attacker to bypass the validation via dangerous inputs that only arise after the modification.

CWE-180

Incorrect Behavior Order: Validate Before Canonicalize
The product validates input before it is canonicalized, which prevents the product from detecting data that becomes invalid after the canonicalization step.

CWE-173

Improper Handling of Alternate Encoding
The product does not properly handle when an input uses an alternate encoding that is valid for the control sphere to which the input is being sent.

CWE-172

Encoding Error
The product does not properly encode or decode the data, resulting in unexpected values.

CWE-184

Incomplete List of Disallowed Inputs
The product implements a protection mechanism that relies on a list of inputs (or properties of inputs) that are not allowed by policy or otherwise require other action to neutralize before additional processing takes place, but the list is incomplete.

CWE-183

Permissive List of Allowed Inputs
The product implements a protection mechanism that relies on a list of inputs (or properties of inputs) that are explicitly allowed by policy because the inputs are assumed to be safe, but the list is too permissive - that is, it allows an input that is unsafe, leading to resultant weaknesses.

CWE-74

Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')
The product constructs all or part of a command, data structure, or record using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify how it is parsed or interpreted when it is sent to a downstream component.

CWE-20

Improper Input Validation
The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

CWE-697

Incorrect Comparison
The product compares two entities in a security-relevant context, but the comparison is incorrect, which may lead to resultant weaknesses.

CWE-692

Incomplete Denylist to Cross-Site Scripting
The product uses a denylist-based protection mechanism to defend against XSS attacks, but the denylist is incomplete, allowing XSS variants to succeed.

References

REF-1

Exploiting Software: How to Break Code
G. Hoglund, G. McGraw.

Submission

Name Organization Date Date release
CAPEC Content Team The MITRE Corporation 2014-06-23 +00:00

Modifications

Name Organization Date Comment
CAPEC Content Team The MITRE Corporation 2017-01-09 +00:00 Updated Related_Attack_Patterns
CAPEC Content Team The MITRE Corporation 2018-07-31 +00:00 Updated References
CAPEC Content Team The MITRE Corporation 2020-07-30 +00:00 Updated Execution_Flow, Mitigations
CAPEC Content Team The MITRE Corporation 2020-12-17 +00:00 Updated Taxonomy_Mappings
CAPEC Content Team The MITRE Corporation 2021-06-24 +00:00 Updated Related_Weaknesses
CAPEC Content Team The MITRE Corporation 2022-09-29 +00:00 Updated Example_Instances