• Home
  • About
  • Locations
logologologologo
  • Plan
    • vCISO
    • Policies & Procedures
    • Strategy & Security Program Creation
    • Risk Management
  • Attack
    • Penetration Testing
    • PTaaS
    • Red Teaming
    • Web Application Penetration Testing
    • Mobile Application Penetration Testing
    • IOT Penetration Testing
  • Defend
    • Office 365 Security
    • HIPAA Compliance
    • PCI Compliance
    • Code Reviews
    • Blockchain Security Analysis
    • Vulnerability Assessments
  • Recover
    • Ransomware Recovery
    • Expert Witness
    • Forensics
  • Learn
    • Resources
    • Penetration Testing Training
    • Blog
  • Contact Us
  • Instant Quote
✕

Hugging Face RCE Vulnerability Exposes Millions of AIs

June 6, 2026

Hugging Face RCE Vulnerability Exposes Millions of AI Systems

The Hugging Face RCE vulnerability tracked as CVE-2026-4372 is one of the most consequential AI security disclosures of 2026. A critical remote code execution flaw buried inside the Hugging Face Transformers library silently executed attacker-controlled code on any system that loaded a poisoned AI model. No special flags. No security warnings. No user interaction beyond the standard from_pretrained() call that millions of developers use every single day.

This Hugging Face RCE vulnerability bypassed one of the platform's primary security controls, the trust_remote_code=False setting, which organizations explicitly rely on to prevent untrusted code execution. Discovered by Pluto Security researcher Yotam Perkal and disclosed in June 2026, the flaw affected all Transformers versions from 4.56.0 through 5.2.x, a vulnerable window stretching nearly six months from August 2025 until the silent patch in version 5.3.0 released March 3, 2026.

The blast radius of this Hugging Face RCE vulnerability is enormous. The Transformers library records over 2.2 billion total installs and approximately 146 million monthly downloads. Vulnerable versions continue to be downloaded seven to eight million times per week and account for roughly one quarter of all weekly installations at the time of disclosure. Every enterprise AI pipeline, ML research environment, and production inference system running an affected version is a potential target.

If your team uses Hugging Face Transformers and has not confirmed an upgrade to version 5.3.0, you are exposed right now. Here is everything your security and DevSecOps teams need to understand.


What Is the Hugging Face RCE Vulnerability CVE-2026-4372

The Hugging Face RCE vulnerability CVE-2026-4372 is a critical remote code execution flaw in the Hugging Face Transformers library that allows attackers to execute arbitrary Python code on a victim's system simply by hosting a malicious AI model on Hugging Face Hub. The victim needs only to load the model using the standard from_pretrained() API call. No additional configuration, no special flags, and no unusual user behavior is required.

The vulnerability stems from improper handling of untrusted data in model configuration files, specifically targeting the _attn_implementation_internal attribute inside a model's config.json file.

CVE-2026-4372 at a glance:

  • CVE ID: CVE-2026-4372
  • Severity: Critical
  • Affected Versions: Hugging Face Transformers 4.56.0 through 5.2.x
  • Condition: Requires the optional kernels package to be installed
  • Vulnerable Period: August 2025 through March 3, 2026
  • Discovered By: Yotam Perkal, Pluto Security
  • Patched In: Transformers version 5.3.0 released March 3, 2026
  • Attack Vector: Remote, unauthenticated, no user interaction required
  • Security Control Bypassed: trust_remote_code=False
  • Data at Risk: AWS credentials, SSH keys, API tokens, environment variables, cloud workload access

The flaw mirrors previous machine learning ecosystem vulnerabilities including PyTorch's weights_only bypass tracked as CVE-2025-32434, where designated safe modes failed to prevent code execution. It also parallels a ChromaDB RCE disclosed the previous month, where malicious model configurations on Hugging Face Spaces triggered code execution in Chroma servers. The Hugging Face RCE vulnerability is part of a clear and accelerating pattern of AI supply chain attacks targeting the model loading infrastructure that the entire ML industry depends on.


How the Hugging Face RCE Vulnerability Works: A Technical Breakdown

This section explains exactly how CVE-2026-4372 works from attacker setup through code execution on the victim's system. Understanding the full technical chain is essential for DevSecOps teams and security architects building defenses for AI pipelines.

The Three Design Decisions That Created the Vulnerability

The Hugging Face RCE vulnerability does not result from a single coding error. It arises from the combination of three separate design decisions that individually seemed reasonable but together introduced a silent RCE path.

Design decision one: Automatic configuration download during model loading

When a developer calls AutoModelForCausalLM.from_pretrained("model-name"), the Transformers library automatically downloads the model's configuration file, weights, and tokenizer from Hugging Face Hub. The library then assembles the correct architecture and returns a ready-to-use model object. This convenience is exactly why Transformers has 2.2 billion installs. But it also means that any attacker who can host a model on Hugging Face Hub can deliver a malicious config.json to any developer who loads their model.

Design decision two: Unrestricted attribute injection via config.json

The library did not enforce a denylist on which attributes could be set through config.json during the vulnerable period. An attacker could inject the _attn_implementation_internal attribute into a model's configuration file and point it at a malicious kernel repository hosted on Hugging Face Hub.

Design decision three: Automatic kernel import without trust_remote_code

When the library encounters the _attn_implementation_internal attribute during model loading, it automatically downloads and imports the referenced kernel module. Critically, this import occurred without requiring trust_remote_code=True, the explicit flag developers use to signal they intentionally want to allow remote code. The safety check was entirely bypassed. Attacker-controlled Python code executed silently during what appeared to be a standard model load.

The Full Attack Chain

Here is the complete exploitation sequence for the Hugging Face RCE vulnerability from attacker setup through victim compromise:

  • Attacker creates a Hugging Face Hub account and uploads a model repository
  • Attacker modifies the model's config.json to inject the _attn_implementation_internal attribute
  • The attribute references a malicious kernel repository also hosted on Hugging Face Hub containing arbitrary Python code
  • Victim developer, data scientist, or automated ML pipeline calls from_pretrained() with the model name
  • Transformers downloads the malicious config.json automatically
  • Library detects the _attn_implementation_internal attribute and initiates kernel loading
  • Malicious kernel repository downloads and imports automatically without triggering trust_remote_code checks
  • Attacker-controlled Python code executes immediately on the victim's system
  • Attacker harvests AWS credentials, SSH keys, API tokens, and environment variables from the compromised environment
  • Victim sees no unusual warnings, prompts, or error messages throughout

The entire chain executes within the normal model loading flow. From the developer's perspective, the model simply takes slightly longer to load. No security alert fires. No unusual behavior is visible. The compromise is silent.

What Attackers Can Access After Exploitation

Successful exploitation of the Hugging Face RCE vulnerability grants attackers code execution in the context of whatever process is running the from_pretrained() call. In most ML environments, this means access to:

  • AWS credentials and IAM role tokens stored in environment variables
  • SSH private keys accessible from the execution environment
  • Hugging Face API tokens with write access to model repositories
  • Google Cloud and Azure service account credentials
  • Database connection strings and application API keys
  • The full file system accessible to the ML process user
  • Network connectivity to internal services reachable from the ML host
  • Container breakout opportunities if running in inadequately isolated containers

In production inference environments running GPU-backed servers, this access level provides a direct foothold into enterprise cloud infrastructure from what appears to be a routine model loading operation.

The Six-Month Silent Exposure Window

The Hugging Face RCE vulnerability was introduced in Transformers version 4.56.0 in August 2025. It remained exploitable until the silent patch in version 5.3.0 on March 3, 2026. That is a six-month window during which any developer or automated pipeline loading models from Hugging Face Hub while running the kernels optional package was silently exposed.

The patch was not announced as a security fix at the time of release. Pluto Security disclosed the vulnerability publicly in June 2026, more than three months after the fix was already available. This means organizations that did not upgrade to 5.3.0 for unrelated reasons remained exposed throughout that entire period with no knowledge of the risk.


The LeRobot RCE Vulnerability: A Parallel Hugging Face Threat

Alongside CVE-2026-4372, a second critical Hugging Face RCE vulnerability affecting a different product also emerged in 2026. CVE-2026-25874 affects LeRobot, Hugging Face's open-source machine learning framework for real-world robotics, which has nearly 24,000 stars on GitHub.

This vulnerability carries a CVSS score of 9.3 and allows completely unauthenticated remote attackers to execute arbitrary system commands on any host running the LeRobot PolicyServer component.

How CVE-2026-25874 works:

LeRobot's asynchronous inference architecture offloads policy computation to a GPU-backed server via a gRPC-based PolicyServer. This server uses Python's pickle.loads() function to deserialize incoming data across multiple RPC endpoints. The gRPC channels have no TLS encryption and no authentication controls by default.

An attacker who can reach the PolicyServer network port sends a crafted pickle payload to any of the exposed RPC endpoints. The pickle.loads() call deserializes the attacker-controlled data and executes arbitrary operating system commands on the host machine. No credentials, no prior access, and no authentication mechanism stands between the attacker and full command execution on the robotics server.

CVE-2026-25874 key facts:

  • CVE ID: CVE-2026-25874
  • CVSS Score: 9.3 (Critical)
  • Affected Component: LeRobot async inference PolicyServer
  • Attack Type: Unauthenticated pickle deserialization RCE via gRPC
  • Authentication Required: None
  • TLS Required: None in default configuration
  • Validated Against: LeRobot v0.4.3 in real-world testing by Resecurity
  • Patch Status: Unpatched at time of initial disclosure

The irony, as researchers noted, is hard to overstate. Hugging Face itself developed the safetensors format specifically to address the dangers of pickle-based serialization in ML model distribution. Yet LeRobot's production inference infrastructure used the exact pickle.loads() pattern that safetensors was built to replace.

Both CVE-2026-4372 and CVE-2026-25874 together confirm that the Hugging Face RCE vulnerability problem is systemic across multiple products, not isolated to a single component.


Why the Hugging Face RCE Vulnerability Is a Critical Supply Chain Threat

The Hugging Face RCE vulnerability CVE-2026-4372 represents a textbook AI supply chain attack. Understanding why this threat class is so dangerous helps security teams build appropriate defenses and risk frameworks.

The AI model supply chain is the new software supply chain:

  • The software supply chain security community spent years building controls around npm packages, PyPI libraries, and container images after SolarWinds and Log4Shell
  • AI models are the new dependency. Organizations that carefully control their software package sources often load AI models from public repositories with no equivalent scrutiny
  • A single poisoned model on Hugging Face Hub can compromise every developer and pipeline that loads it, exactly as a poisoned npm package compromises every application that installs it

Enterprise ML pipeline impact:

  • Organizations running automated ML pipelines that pull models from Hugging Face Hub during training, fine-tuning, or inference are fully exposed
  • CI/CD systems that load models as part of automated testing run with elevated permissions, making them especially high-value targets for credential theft
  • Data science notebooks running from_pretrained() calls in shared Jupyter environments can compromise all users of the shared infrastructure

Cloud credential theft at scale:

  • ML environments almost universally have cloud credentials attached for accessing training data, model registries, and inference infrastructure
  • Successful exploitation of the Hugging Face RCE vulnerability in a cloud- attached ML environment provides immediate access to whatever cloud resources the execution role can reach
  • A compromised AWS credential from an ML environment can enable S3 data exfiltration, EC2 instance deployment, and IAM privilege escalation depending on the attached role's permissions

Regulatory and compliance exposure:

  • Any environment using Hugging Face Transformers versions 4.56.0 through 5.2.x to process personal data, financial records, or regulated information has potential breach notification obligations if compromise is confirmed or suspected
  • GDPR, HIPAA, PCI-DSS, and similar frameworks require assessment and notification when unauthorized code execution could have accessed regulated data
  • Security teams must treat the six-month exposure window as a potential unauthorized access period requiring retroactive investigation

The recurring design problem:

This Hugging Face RCE vulnerability highlights a recurring security failure across the ML ecosystem. Rapid prototyping culture treats model artifacts as inert data files rather than executable inputs. The trust_remote_code=False default created a false sense of security. The kernels auto-loading path bypassed that control entirely without any documentation change or security notice. Until the ML ecosystem treats model loading with the same security scrutiny applied to executing downloaded binaries, this class of vulnerability will continue to emerge.


Five Real-World Attack Scenarios

Scenario 1: Poisoned Model Harvests Cloud Credentials from ML Pipeline

A threat actor creates a convincing Hugging Face Hub repository mimicking a popular open-source model, slightly modifying the repository name to exploit typosquatting. They inject a malicious _attn_implementation_internal attribute into the config.json pointing to a credential harvesting payload. An enterprise ML engineer loads the model in an automated pipeline running on an AWS EC2 instance with a high-privilege IAM role attached. Credentials exfiltrate silently. The attacker uses the harvested IAM credentials to access S3 buckets containing proprietary training data and customer records.

Scenario 2: AI Research Environment Compromise via Shared Jupyter Hub

A university research team shares a JupyterHub environment configured with Transformers 4.56.0 and the kernels package. A malicious actor publishes a research model repository on Hugging Face Hub and promotes it in an academic forum. A researcher loads the model. The RCE payload executes in the shared JupyterHub environment with access to all users' notebooks, credentials, and API tokens. The attacker harvests Hugging Face tokens with write access to the research team's own model repositories and poisons those models to extend the infection to downstream users.

Scenario 3: CI/CD Pipeline Compromise Through Automated Model Evaluation

An enterprise MLOps team runs automated model evaluation in their CI/CD pipeline using a GitHub Actions runner that loads candidate models from Hugging Face Hub using from_pretrained(). An attacker submits a malicious model to an open evaluation benchmark. The CI/CD pipeline loads the model during automated scoring. The RCE payload executes on the GitHub Actions runner with access to all repository secrets, including deployment credentials and production API keys. The attacker uses harvested secrets to push malicious code to production.

Scenario 4: Robotics Infrastructure Attack via LeRobot CVE-2026-25874

A manufacturing facility runs LeRobot on an internet-connected server to manage robotic arm inference tasks. The PolicyServer gRPC port is accessible from the facility network. An attacker on the same network segment sends a crafted pickle payload to the exposed gRPC endpoint. Code executes on the robotics server as the LeRobot process user. The attacker establishes a reverse shell, moves laterally to the operational technology network, and gains visibility into production control systems.

Scenario 5: Hugging Face Token Theft Enables Repository Poisoning Campaign

An attacker exploits CVE-2026-4372 against a prolific open-source AI developer's workstation. The harvested Hugging Face API token has write access to twenty public model repositories with a combined download count of five million per month. The attacker injects malicious config.json modifications into all twenty repositories. Each subsequent download by the community triggers silent RCE on the downloader's system, creating a self-propagating supply chain infection across the broader AI development community.


Detection and Monitoring for the Hugging Face RCE Vulnerability

Detecting exploitation of the Hugging Face RCE vulnerability requires monitoring at the process, network, and cloud credential layers simultaneously.

Endpoint and Process Monitoring

  • Alert on Python processes spawning unexpected child processes such as bash, curl, wget, netcat, or reverse shell indicators immediately after a from_pretrained() call
  • Monitor for file creation events in /tmp/ directories following model loading operations, a common staging location for RCE payloads
  • Detect outbound network connections from Python processes to unexpected external destinations immediately following model downloads
  • Flag execution of system commands from within Python interpreter processes running in ML environments
  • Alert on unexpected SSH key creation or modification events following Python-based model loading activity
  • Monitor for environment variable access patterns targeting AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, HUGGING_FACE_HUB_TOKEN, and similar credential variables from Python processes

Network-Based Detection

  • Monitor DNS queries and outbound connections from ML servers to unexpected external hosts during or immediately after Hugging Face Hub download activity
  • Alert on outbound connections from ML environments to IP addresses not associated with Hugging Face Hub infrastructure
  • Detect unexpected data transfers from ML environments to cloud storage destinations not in your approved data pipeline
  • Monitor for gRPC traffic on non-standard ports from servers running LeRobot PolicyServer components
  • Alert on reverse shell connection patterns originating from Python processes on ML infrastructure

Cloud Credential Monitoring

  • Enable AWS CloudTrail, Azure Activity Logs, and GCP Audit Logs on all cloud accounts associated with ML environments
  • Alert on API calls from ML environment credentials outside normal working hours or from unexpected source IP addresses
  • Implement credential anomaly detection to flag unusual resource access patterns from ML-attached IAM roles
  • Monitor for new IAM user creation, policy modifications, or role assumption events originating from ML environment credentials
  • Enable GuardDuty or equivalent cloud threat detection on all accounts used by ML pipelines

Threat Hunting Guidance

Run these proactive hunts across your ML infrastructure:

  • Query process execution logs for Python spawning shell processes on all ML servers during the August 2025 through March 2026 exposure window
  • Review Hugging Face Hub access logs for models downloaded by your organization that have since been removed or modified
  • Audit all SSH authorized_keys files on ML infrastructure for keys added during the vulnerability exposure window
  • Review cloud credential usage logs for anomalous API calls during August 2025 through March 2026
  • Check all environment variable access logs from Python processes for credential variable reads not associated with legitimate application behavior
  • Scan ML servers for unexpected files in /tmp/ created during the exposure window that may be residual RCE payload artifacts

Mitigation Recommendations for the Hugging Face RCE Vulnerability

These are the concrete actions your DevSecOps and security teams must take immediately to address the Hugging Face RCE vulnerability and prevent future AI supply chain compromises.

Patch Immediately

  • Upgrade Hugging Face Transformers to version 5.3.0 or later immediately on all systems where the library is installed
  • Verify upgrade completion across all environments including development workstations, shared Jupyter environments, CI/CD runners, and production inference servers
  • Confirm the kernels optional package is also updated to a version compatible with the patched Transformers library
  • Run pip show transformers on all ML systems to verify the installed version and confirm patching is complete

Audit for Retroactive Compromise

Given the six-month silent exposure window, assume that any environment running affected versions during August 2025 through March 2026 may have been compromised:

  • Review all cloud credential usage from ML environments during the exposure window for anomalous activity
  • Rotate all cloud credentials, API tokens, SSH keys, and service account secrets accessible from affected ML environments as a precautionary measure
  • Audit SSH authorized_keys files on all ML infrastructure for unauthorized additions
  • Review Hugging Face Hub tokens associated with affected environments and revoke and reissue them
  • Check all model repositories your team controls for unauthorized config.json modifications

Implement AI Model Supply Chain Controls

  • Establish an approved model registry for your organization, an internal mirror of vetted models rather than direct Hugging Face Hub downloads in production
  • Implement model integrity verification by recording checksums of approved models and validating them before loading in production pipelines
  • Restrict from_pretrained() calls in automated pipelines to internal registry sources only, blocking direct Hub downloads in CI/CD and production
  • Review and approve all new model sources before allowing them into automated ML pipelines
  • Audit all models currently in use across your ML infrastructure and confirm their integrity against known-good versions

Isolate ML Execution Environments

  • Run model loading and inference in isolated containers with minimal network access and no cloud credential environment variables attached
  • Use separate IAM roles for model downloading and for production inference, applying least-privilege principles to both
  • Implement network egress controls on ML containers restricting outbound connections to approved Hugging Face Hub domains and your internal registry only
  • Disable SSH key-based authentication from ML execution environments to prevent harvested keys from enabling lateral movement
  • Use ephemeral execution environments for model evaluation in CI/CD pipelines, destroying the environment after each evaluation run

Fix LeRobot PolicyServer Deployments

For any environment running Hugging Face LeRobot:

  • Replace pickle.loads() with secure alternatives including JSON, native protobuf fields, or Hugging Face's own safetensors format
  • Enable TLS on all gRPC channels by replacing add_insecure_port() with add_secure_port() with properly issued certificates
  • Implement authentication on all gRPC endpoints using interceptors and token-based access controls
  • Block all external network access to LeRobot PolicyServer gRPC ports using host-based firewall rules
  • Monitor PolicyServer processes for unexpected child process spawning indicating active exploitation

Zero Trust Controls for ML Infrastructure

  • Apply Zero Trust network access principles to all ML infrastructure, requiring explicit authentication and authorization for every connection
  • Implement secrets management solutions such as HashiCorp Vault or AWS Secrets Manager to remove cloud credentials from environment variables accessible to ML processes
  • Enable just-in-time cloud access for ML pipelines, providing temporary scoped credentials for each pipeline run rather than persistent attached roles
  • Apply microsegmentation isolating ML training, evaluation, and inference environments from each other and from production application infrastructure

What This Means for the Future of AI Security

The Hugging Face RCE vulnerability is not a one-off event. It is a symptom of a structural security problem in how the AI and ML industry treats model artifacts, and security teams need to understand the broader implications.

AI models are executable code, not data files. The single most important mindset shift the security community needs to make is treating AI model artifacts with the same suspicion applied to executable binaries and third-party libraries. A model's config.json is not a passive configuration file. Under the conditions exploited by CVE-2026-4372, it is an executable attack vector. Organizations that audit their npm dependencies for malicious packages but load AI models with no equivalent scrutiny have an unexamined attack surface.

The trust_remote_code safety mechanism is insufficient. CVE-2026-4372 demonstrated that the trust_remote_code=False default provided a false sense of security. The kernels loading path bypassed it entirely. Developers and security teams that believed trust_remote_code=False made their model loading safe were wrong for six months without knowing it. Security controls that can be silently bypassed through undocumented code paths are not security controls.

The six-month silent patch window is a disclosure failure. Transformers 5.3.0 patched CVE-2026-4372 on March 3, 2026 without announcing the security fix. For three months, the fix was available but organizations had no reason to prioritize the upgrade. This silent patching approach protects Hugging Face's reputation in the short term but leaves the security community unable to make informed patching decisions. Coordinated vulnerability disclosure with clear security advisories is a non-negotiable expectation for any library with 2.2 billion installs.

AI supply chain security is the next frontier. Just as the software supply chain security community built SBOMs, dependency auditing, and package signing to address the npm and PyPI threat landscape, the AI security community now needs equivalent controls for model artifacts. Model signing, integrity verification, provenance tracking, and isolated execution environments are not nice-to-have features. They are the baseline security architecture for any organization that treats AI models as production dependencies.


Key Takeaway

The Hugging Face RCE vulnerability CVE-2026-4372 silently exposed millions of ML workflows to remote code execution for six months before a quiet patch and three more months before public disclosure. Any developer calling from_pretrained() on a malicious model with Transformers 4.56.0 through 5.2.x and the kernels package installed executed attacker-controlled Python code on their system with no warning, no unusual behavior, and no security bypass required. The trust_remote_code=False safety control was completely bypassed through an undocumented kernel loading path.

The companion vulnerability CVE-2026-25874 in LeRobot adds a second critical unauthenticated RCE path through pickle deserialization over unsecured gRPC channels, confirming that the Hugging Face RCE vulnerability problem extends beyond a single product to systemic insecure design patterns across the platform's ecosystem.

Every organization using Hugging Face Transformers must patch to version 5.3.0 immediately and conduct a retroactive audit of the six-month exposure window.

Summary of critical actions:

  • Upgrade Hugging Face Transformers to version 5.3.0 or later on every system immediately
  • Rotate all cloud credentials, API tokens, SSH keys, and Hugging Face Hub tokens accessible from ML environments running vulnerable versions
  • Audit cloud credential usage logs from August 2025 through March 2026 for signs of unauthorized access
  • Establish an internal approved model registry to replace direct Hugging Face Hub downloads in production and CI/CD pipelines
  • Implement model integrity verification by checksum before loading any model in automated pipelines
  • Isolate ML execution environments from cloud credentials using secrets management and least-privilege IAM roles
  • Fix LeRobot PolicyServer deployments by replacing pickle with safetensors, enabling TLS, and enforcing gRPC authentication
  • Treat all AI model artifacts as executable inputs requiring the same supply chain scrutiny as software packages
  • Hunt proactively for signs of exploitation during the six-month silent exposure window

The Hugging Face RCE vulnerability is a defining moment for AI security. The organizations that respond correctly will build the model supply chain controls, execution isolation, and credential protection that make their AI infrastructure resilient to this entire class of threat. The organizations that treat this as a simple patch event and move on will remain exposed to the next silent vulnerability in an AI framework they trust completely.


Frequently Asked Questions About the Hugging Face RCE Vulnerability

What is the Hugging Face RCE vulnerability CVE-2026-4372?

The Hugging Face RCE vulnerability CVE-2026-4372 is a critical remote code execution flaw in the Hugging Face Transformers library affecting versions 4.56.0 through 5.2.x when the optional kernels package is installed. It allows attackers to execute arbitrary Python code on any system that loads a malicious AI model using the standard from_pretrained() API call. The vulnerability bypasses the trust_remote_code=False safety control by exploiting an undocumented kernel loading path triggered by a poisoned _attn_implementation_internal attribute in a model's config.json file. No user interaction beyond the standard model loading call is required.

Is the Hugging Face RCE vulnerability being actively exploited?

The Hugging Face RCE vulnerability CVE-2026-4372 was silently patched in March 2026 and publicly disclosed in June 2026. Pluto Security published a proof-of- concept demonstrating successful exploitation at the time of disclosure. Given the six-month exposure window, the massive download volume of affected versions, and the value of the cloud credentials accessible in typical ML environments, security teams should assume that exploitation occurred during the exposure window and conduct appropriate retroactive investigations.

How does the Hugging Face RCE vulnerability bypass trust_remote_code=False?

The Hugging Face RCE vulnerability bypasses trust_remote_code=False through a kernel loading path that was separate from the code path where the safety check was enforced. When the library encountered the _attn_implementation_internal attribute in a model's config.json during loading, it automatically downloaded and imported the referenced kernel module without passing through the trust_remote_code validation logic. This meant the library's primary defense against remote code execution, the explicit user consent flag, was completely ineffective against this specific attack vector.

Why is the Hugging Face RCE vulnerability a supply chain threat?

The Hugging Face RCE vulnerability represents an AI supply chain attack because it weaponizes the model distribution infrastructure that the entire ML industry depends on. An attacker needs only to publish a malicious model on Hugging Face Hub and promote it to their target community. Every developer, automated pipeline, or CI/CD system that loads the model becomes compromised. This mirrors software supply chain attacks like the xz-utils backdoor and malicious npm packages, but targets the AI model dependency chain that most organizations have not yet applied equivalent security scrutiny to.

Who is affected by the Hugging Face RCE vulnerability CVE-2026-4372?

Any individual developer, research team, enterprise ML pipeline, or automated CI/CD system running Hugging Face Transformers versions 4.56.0 through 5.2.x with the optional kernels package installed is affected. Given the library's 2.2 billion total installs and 146 million monthly downloads, the potential affected population is enormous. Environments with cloud credentials attached, including AWS EC2 instances, Google Cloud VMs, and Azure ML workspaces, face the highest risk because successful exploitation provides immediate access to those credentials.

How should organizations respond to the Hugging Face RCE vulnerability?

Organizations should respond to the Hugging Face RCE vulnerability immediately by upgrading all Transformers installations to version 5.3.0 or later. Security teams should then conduct a retroactive investigation of the August 2025 through March 2026 exposure window, rotating all cloud credentials, SSH keys, API tokens, and Hugging Face Hub tokens accessible from affected environments. Cloud access logs should be audited for anomalous activity during that period. Going forward, organizations should establish approved model registries, implement model integrity verification, isolate ML execution environments from cloud credentials, and treat AI model artifacts with the same supply chain scrutiny applied to third-party software packages.

Share
Copyright © Digital Warfare. All rights reserved.
  • Home
  • About
  • Locations