In the shade of a tree: Insecure Features In PDFs

In 2019, we published attacks on PDF Signatures and PDF Encryption. During our research and studying the related work, we discovered a lot of blog posts, talks, and papers focusing on malicious PDFs causing some damage. However, there was no systematic analysis of all possible dangerous features supported by PDFs, but only isolated exploits and attack concepts.

We decided to fill this gap and systematize the possibilities to use legitimate PDF features and do bad stuff. We define four attack categories: Denial of Service, Information Disclosure, Data Manipulation, and Code Execution.

Our evaluation reveals 26 of 28 popular PDF processing applications are vulnerable to at least one attack. You can download all malicious PDFs here. You can also find more technical details in our NDSS'21 paper.

This is a joined work of Jens Müller, Dominik Noss, Christian Mainka, Vladislav Mladenov, and Jörg Schwenk.

Dangerous Paths: Overview

To identify attack vectors, we systematically surveyed which potentially dangerous features exist in the PDF specification. We created a comprehensive list with all PDF Actions that can be called. This list contains 18 different actions that we carefully studied.

We selected eight actions – the ones that directly or indirectly allow access to a file handle and may therefore be abused for dangerous features such as URL invocation or writing to files. Having a list of security-sensitive actions, we proceeded by investigating all objects and related events that can trigger these actions.

We identified four PDF objects which allow calling arbitrary actions (Page, Annotation, Field, and Catalog). Most objects offer multiple alternatives for this purpose. For example, the Catalog object, defines the OpenAction or additional actions (AA) events. Each event can launch any sequence of PDF actions, for example, Launch, Thread, etc. JavaScript actions can be embedded within documents. It opens a new area for attacks, for example, new annotations can be created that can have actions which once again lead to accessing file handles.

Denial of Service

The goal of the denial of service class of attacks is enforcing to process PDF applications in consuming all available resources (i.e., computing time or memory) or causes them to crash by opening a specially crafted PDF document. We identified two variants: Infinite Loop and Deflate Bomb.

Infinite Loop

This variant induces an endless loop causing the program execution to get stuck. The PDF standard allows various elements of the document structure to reference to themselves, or to other elements of the same type.

Action loop: PDF actions allow to specify a Next action to be performed, thereby resulting in "action cycles".

ObjStm loop: Object streams may extend other object streams allows the crafting of a document with cycles.

Outline loop: PDF documents may contain an outline. Its entries, however, can refer to themselves or each other.

Calculations: PDF defines "Type 4" calculator functions, for example, to transform colors. Processing hard-to-solve mathematical formulas may lead to high demands of CPU.

JavaScript: Finally, in case the PDF application processes scripts within documents, infinite loops can be induced.

Deflate Bomb

Data amplification attacks based on malicious zip archives are well-known. The first publicly documented DoS attack using a "zip bomb" was conducted in 1996 against a Fidonet BBS administrator. However, not only zip files but also stream objects within PDF documents can be compressed using various algorithms such as Deflate to reduce the overall file size.

Information Disclosure

The goal of this class of attacks is to track the usage of a document by silently invoking a connection to the attacker's server once the file is opened, or to leak PDF document form data, local files, or NTLM credentials to the attacker.

URL Invocation

PDF documents that silently "phone home" should be considered as privacy-invasive. They can be used, for example, to deanonymize reviewers, journalists, or activists behind a shared mailbox. The attack's goal is to open a backchannel to an attacker-controlled server once the PDF file is opened by the victim.

The possibility of malicious URI resolving in PDF documents has been introduced by Hamon [1] who gave an evaluation for URI and SubmitForm actions in Acrobat Reader. We extend their analysis to all standard PDF features that allow opening a URL, such as ImportData, Launch, GoToR, and JavaScript.

Form Data Leakage

Documents can contain forms to be filled out by the user – a feature introduced with PDF version 1.2 in 1996 and used on a daily basis for routine offices tasks, such as travel authorization or vacation requests. The idea of this attack is as follows: The victim downloads a form – a PDF document which contains form fields – from an attacker controlled source and fills it out on the screen, for example, in order to print it. The form is manipulated by the attacker in such a way that it silently send input data to the attacker's server.

Local File Leakage

The PDF standard defines various methods to embed external files into a document or otherwise access files on the host's file system, as documented below.

External streams: Documents can contain stream objects (e.g., images) to be included from external files on disk.
Reference XObjects: This feature allows a document to import content from another (external) PDF document.
Open Prepress Interface: Before printing a document, local files can be defined as low-resolution placeholders.
Forms Data Format (FDF): Interactive form data can be stored in, and auto-imported from, external FDF files.
JavaScript functions: The Adobe JavaScript reference enables documents to read data from or import local files.

If a malicious document managed to firstly read files from the victim's disk and secondly, send them back to the attacker, such behavior would arguably be critical.

Credential Theft

In 1997, Aaron Spangler posted a vulnerability in Windows NT on the Bugtraq mailing list [2]: Any client program can trigger a connection to a rogue SMB server. If the server requests authentication, Windows will automatically try to log in with a hash of the user's credentials. Such captured NTLM hashes allow for efficient offline cracking and can be re-used by applying pass-the-hash or relay attacks to authenticate under the user's identity. In April 2018, Check Point Research [3] showed that similar attacks can be performed with malicious PDF files. They found that the target of GoToR and GoToE actions can be set to \\attacker.com\dummyfile, thereby leaking credentials in the form of NTLM hashes.

Data Manipulation

This attack class deals with the capabilities of malicious documents to silently modify form data, to write to local files on the host's file system, or to show a different content based on the application that is used to open the document.

Form Modification

The idea of this attack is as follows: Similar to Form Data Leakage attacks, the victim obtains a harmlessly looking PDF document from an attacker controlled source, for example, a remittance slip or a tax form. The goal of the attacker is to dynamically, and without knowledge of the victim, manipulate form field data.

File Write Access

The PDF standard enables documents to submit form data to external webservers. Technically the webserver's URL is defined using a PDF File Specification. This ambiguity in the standard may be interpreted by implementations in such a way that they enable documents to submit PDF form data to a local file, thereby writing to this file.

Content Masking

The goal of this attack is to craft a document that renders differently, depending on the applied PDF interpreter. This can be used, for example, to show different content to different reviewers, to trick content filters (AI-based machines as well as human content moderators), plagiarism detection software, or search engines, which index a different text than the one shown to users when opening the document.

Stream confusion: It is unclear how content streams are parsed if their Length value does not match the offset of the endstream marker, or if syntax errors are introduced.
Object confusion: An object can overlay another object. The second object may not be processed if it has a duplicate object number, if it is not listed in the XRef table, or if other structural syntax errors are introduced.
Document confusion: A PDF file can contain yet another document (e.g., as embedded file), multiple XRef tables, etc., which results in ambiguities on the structural level.
PDF confusion: Objects before the PDF header or after an EOF marker may be processed by implementations, introducing ambiguities in the outer document structure.

Code Execution

The goal of this attack is to execute attacker-controlled code. This can be achieved by silently launching an executable file, embedded within the document, to infect the host with malware. The PDF specification defines the Launch action, which allows documents to launch arbitrary applications. The file to be launched can either be specified by a local path, a network share, a URL, or a file embedded within the PDF document itself.