Unsafe Cookie
ID |
python.unsafe_ml_model_load |
Severity |
high |
Resource |
Injection |
Language |
Python |
Tags |
CWE:502, NIST.SP.800-53, OWASP:2021:A8, PCI-DSS:6.5.1 |
Description
Deserialization of untrusted data via torch.load
can lead to arbitrary code execution, allowing attackers to compromise the system.
Rationale
PyTorch’s torch.load()
function is commonly used to deserialize model files saved with torch.save()
. However, this deserialization process is based on Python’s pickle
module, which is inherently unsafe when handling untrusted input.
Using torch.load()
on a file that has been tampered with by an attacker may lead to arbitrary code execution, as pickle
can instantiate objects with executable behavior.
Consider the following example:
import torch
# Loading a model from an untrusted source (unsafe)
model = torch.load("model_from_external_source.pt")
If model_from_external_source.pt
was crafted maliciously, executing this line could run arbitrary Python code embedded in the file.
Remediation
To mitigate this vulnerability:
-
Avoid loading models from untrusted or unauthenticated sources. Only use
torch.load()
on files you fully trust. -
Use
torch.jit.load()
instead oftorch.load()
when feasible, as JIT-serialized models do not rely on pickle and are thus safer.Safer alternative using
torch.jit.load()
:import torch # Loading a JIT-serialized model (safer) model = torch.jit.load("trusted_scripted_model.pt")
-
Validate file integrity using hashes or digital signatures if model files must be loaded from remote or external locations.
-
Run the deserialization process in a sandboxed or isolated environment if untrusted data must be handled for any reason (e.g., academic research or debugging).
As a best practice, implement strict controls around model intake pipelines and automate validation steps as part of your ML Ops process.
Configuration
The detector has the following configurable parameters:
-
sources
, that indicates the source kinds to check. -
neutralizations
, that indicates the neutralization kinds to check.
Unless you need to change the default behavior, you typically do not need to configure this detector.
References
-
CWE-502: Deserialization of Untrusted Data.
-
OWASP A08_2021: A08:2021 – Software and Data Integrity Failures
-
PyTorch; Lib Reference