Unsafe Cookie

python.unsafe_ml_model_load

Severity

high

Resource

Injection

Language

Python

Description

Deserialization of untrusted data via torch.load can lead to arbitrary code execution, allowing attackers to compromise the system.

Rationale

PyTorch’s torch.load() function is commonly used to deserialize model files saved with torch.save(). However, this deserialization process is based on Python’s pickle module, which is inherently unsafe when handling untrusted input.

Using torch.load() on a file that has been tampered with by an attacker may lead to arbitrary code execution, as pickle can instantiate objects with executable behavior.

Consider the following example:

import torch

# Loading a model from an untrusted source (unsafe)
model = torch.load("model_from_external_source.pt")

If model_from_external_source.pt was crafted maliciously, executing this line could run arbitrary Python code embedded in the file.

Remediation

To mitigate this vulnerability:

Avoid loading models from untrusted or unauthenticated sources. Only use torch.load() on files you fully trust.
Use torch.jit.load() instead of torch.load() when feasible, as JIT-serialized models do not rely on pickle and are thus safer.

Safer alternative using torch.jit.load():
```
import torch

# Loading a JIT-serialized model (safer)
model = torch.jit.load("trusted_scripted_model.pt")
```
Validate file integrity using hashes or digital signatures if model files must be loaded from remote or external locations.
Run the deserialization process in a sandboxed or isolated environment if untrusted data must be handled for any reason (e.g., academic research or debugging).

As a best practice, implement strict controls around model intake pipelines and automate validation steps as part of your ML Ops process.

Configuration

The detector has the following configurable parameters:

sources, that indicates the source kinds to check.
neutralizations, that indicates the neutralization kinds to check.

Unless you need to change the default behavior, you typically do not need to configure this detector.

References

CWE-502: Deserialization of Untrusted Data.
OWASP A08_2021: A08:2021 – Software and Data Integrity Failures
PyTorch; Lib Reference