Output encoding is the process of transforming user-supplied data into a safe format appropriate for the specific context where it will be rendered. Rather than trying to remove dangerous characters from input, output encoding converts them into harmless representations that display correctly but cannot be interpreted as code.
How It Works
The core principle is context-aware encoding. Data that is safe in one context can be dangerous in another. The character < is harmless in a SQL query but dangerous in HTML where it opens a tag. A single quote is safe in HTML body content but breaks out of a JavaScript string. Output encoding addresses this by applying different encoding rules depending on where the data appears.
In HTML body context, encoding converts characters like <, >, &, ", and ' to their HTML entity equivalents (<, >, &, ", '). This ensures that user input containing <script>alert(1)</script> renders as visible text rather than executing as code.
In HTML attribute context, the same characters must be encoded, and the attribute value must be quoted. In JavaScript context, data must be JavaScript-encoded (or better yet, placed in a data attribute and read from the DOM rather than embedded directly in script blocks). In URL context, special characters must be percent-encoded. In CSS context, characters must be CSS-escaped.
The critical mistake is applying the wrong encoding for the context or encoding only once when data passes through multiple contexts. Data that goes from a database into a JavaScript variable that is then inserted into HTML passes through two contexts and may need encoding for each.
Relationship to Input Validation
Output encoding and input validation are complementary, not interchangeable. Input validation rejects or sanitizes data at the point of entry. Output encoding protects data at the point of use. Relying solely on input validation is fragile because it requires anticipating every possible output context at the time of input. Output encoding is applied at render time, where the context is known.
Why It Matters
Output encoding is the most reliable defense against cross-site scripting. When applied correctly and consistently across all output contexts, it prevents user-supplied data from being interpreted as code. Missing or incorrect encoding in a single location is often all an attacker needs to achieve script execution.
Need your application tested? Get in touch.