Anthropic’s Too Dangerous AI Model Mythos Breached on Day One – Here’s What Went Wrong
Anthropic’s “Too Dangerous” AI Model Mythos Breached on Day One — Here’s What Went Wrong
In a stunning embarrassment for one of the world’s most prominent AI safety companies, Anthropic’s flagship cybersecurity AI model — Mythos — was accessed by unauthorized users on the very day of its limited public announcement. The breach, first reported by Bloomberg and corroborated through screenshots and live demonstrations, exposes deep cracks in the security posture of a company that built its brand on responsible AI development.
The incident has sent shockwaves through the cybersecurity community, raised alarm bells in government circles, and prompted serious questions about whether any AI system labeled “too dangerous to release” can truly be kept under wraps.

How a Discord Group Got Inside Anthropic’s Most Restricted AI
The breach did not require sophisticated hacking tools or zero-day exploits. Instead, it was accomplished through a chain of human errors and leaked information that reads more like a cautionary tale than a cyberattack:
- Contractor credentials: One member of a private Discord group held access as a third-party contractor working for Anthropic. This provided the initial foothold into Anthropic’s vendor environment.
- Leaked naming conventions: A separate data breach at AI training startup Mercor had previously exposed internal knowledge about Anthropic’s naming practices and infrastructure patterns. The group used this leaked information to deduce where Mythos was hosted.
- URL guessing: Armed with the naming conventions from the Mercor leak, the group successfully guessed the URL endpoint where the Mythos model was running — bypassing any need for traditional exploitation.
According to Bloomberg, the group — described as a “handful” of users in a private online forum — gained access on the same day Anthropic announced it was releasing Mythos Preview to a select group of approximately 40 companies, including Apple, Microsoft, Google, Goldman Sachs, Amazon, and Cisco, under its Project Glasswing program.
Anthropic confirmed the incident, stating: “We’re investigating a report claiming unauthorized access to Claude Mythos Preview through one of our third-party vendor environments.” The company emphasized that no breaches had been detected within Anthropic’s own core systems.
Why Mythos Is Considered “Too Dangerous”
Mythos is not just another large language model. Anthropic has positioned it as the most cybersecurity-capable AI system ever built — and that capability cuts both ways.
The UK’s AI Security Institute (AISI), widely regarded as the world’s leading authority on AI safety evaluation, concluded that Mythos represents a significant “step up” from previous models in terms of cyber-threat potential. The institute found that Mythos could:
- Execute multi-step cyberattacks requiring sequential coordinated actions
- Discover vulnerabilities in IT systems without human intervention
- Complete tasks that would normally take human security professionals days to accomplish
Most notably, Mythos became the first AI model to successfully complete AISI’s 32-step cyber-attack simulation, solving the challenge in three out of ten attempts. This benchmark was specifically designed to test autonomous hacking capabilities — and Mythos passed it.
On the defensive side, the results are equally impressive. Anthropic used Mythos to discover a 27-year-old security vulnerability in OpenBSD, an operating system renowned for its security-first design. Mozilla reported using a preview of Mythos to identify and patch 271 vulnerabilities in its software ecosystem.
The Irony Was Not Lost on Anyone
The breach’s bitter irony drew immediate attention across social media and tech commentary. An AI model specifically designed to find security vulnerabilities — and deemed too dangerous for public release because of its hacking prowess — was itself compromised through basic operational security failures.
“If some group — some random Discord online forum — got access to it, it’s already been breached by China,” said David Lindner, chief information security officer at Contrast Security and a 25-year industry veteran, in an interview with Fortune. Lindner noted that he was not surprised by the breach, pointing out that even though Anthropic limited initial access to 40 companies, “thousands of people likely had access to the program” through the expanded partner ecosystem.
Not everyone viewed the situation with the same alarm. OpenAI CEO Sam Altman publicly characterized Anthropic’s promotion of Mythos’s danger level as “fear-based marketing”, suggesting that the company was inflating the model’s threat profile for competitive advantage.
UK AI Minister Kanishka Narayan took a different stance, warning that UK businesses “should be worried” about the model’s ability to spot flaws in IT systems — flaws that malicious actors could then exploit at scale.
A Pattern of Security Lapses
The Mythos breach is not an isolated incident for Anthropic. It is at least the company’s second major security lapse in recent months:
- February 2025: An early version of Anthropic’s Claude Code tool accidentally exposed its source code in a publicly accessible database — first reported by Fortune.
- March 2026: Anthropic leaked its own AI coding tool’s source code in what Fortune called a “second major security breach,” just days after the Mythos details were accidentally revealed.
- April 2026: The Mercor data breach exposed naming conventions that directly enabled the Mythos unauthorized access.
This pattern has drawn scrutiny from government agencies. The Pentagon has formally blacklisted Anthropic as a supply-chain risk, and former President Trump publicly stated he had terminated the company’s government contracts. Meanwhile, the U.S. Cyber Command and the National Security Agency have reportedly been independently testing Mythos for offensive cyber operations, underscoring the dual-use nature of the technology.
Project Glasswing: A Defensive Alliance Under Threat
Project Glasswing was Anthropic’s carefully curated program to distribute Mythos to a trusted circle of technology and financial institutions for defensive red-teaming purposes. The premise was straightforward: give the world’s most security-capable AI to the organizations best positioned to use it defensively, and patch vulnerabilities before malicious actors can exploit them.
The breach fundamentally undermines that premise. If a small Discord group can gain access through a contractor credential and educated URL guessing, the assumption that only “trusted partners” have access to Mythos is clearly false. This raises the prospect that other actors — including nation-state adversaries — may have already gained similar access through more sophisticated means.
The UK’s AISI noted that Mythos’s cyber capabilities could enable attacks that require multiple coordinated steps — the kind of complex operations typically associated with advanced persistent threats (APTs) backed by nation-states. If those capabilities are now in the hands of unauthorized users, the defensive advantage that Project Glasswing was supposed to create may already be neutralized.
What Organizations Need to Do Now
The Mythos breach carries immediate implications for organizations of every size. Here is what security leaders should prioritize:
1. Audit Third-Party Contractor Access
The breach originated through a third-party contractor’s credentials. Organizations must conduct a thorough review of all external access to critical systems, implementing strict least-privilege principles and mandatory multi-factor authentication for any contractor accounts.
2. Assume AI-Enabled Attacks Are Coming
Mythos demonstrated the ability to autonomously discover vulnerabilities and execute multi-step attacks. Defensive teams should assume that attackers — whether using Mythos or similar tools — will soon be able to automate discovery and exploitation at unprecedented speed and scale.
3. Harden Naming Conventions and Infrastructure Secrets
The breach was enabled partly by leaked naming conventions from a separate company (Mercor). Organizations should treat internal naming patterns, URL structures, and infrastructure details as sensitive information that, if exposed, could aid attackers in reconnaissance.
4. Invest in AI-Powered Defense
As Lindner noted, smaller companies may be disproportionately vulnerable to AI-fueled attacks because they lack the resources to keep pace with escalating threat complexity. Investing in AI-assisted security tools — including vulnerability scanning, anomaly detection, and automated response — is no longer optional for organizations that want to stay protected.
“The rapid rise of AI as a tool for cyberattacks could disproportionately affect smaller companies, who may not be able to keep up with the increasing complexity of AI-fueled attacks.”
— David Lindner, CISO at Contrast Security
The Bigger Picture: AI Safety vs. AI Security
The Mythos incident highlights a fundamental tension in the AI industry: the distinction between AI safety (preventing AI systems from causing harm) and AI security (preventing unauthorized access to AI systems). Anthropic has built its reputation on safety — on ensuring its models are aligned, honest, and harmless. But the Mythos breach is a security failure, and it demonstrates that even the most safety-conscious AI company can struggle with basic operational security.
As AI models become more powerful, both dimensions matter equally. A model that is perfectly aligned but easily accessible to malicious actors is just as dangerous as a model that is perfectly secured but fundamentally misaligned. The industry must address both challenges simultaneously.
What Happens Next
Anthropic’s investigation is ongoing. The company has not yet disclosed the full scope of what the unauthorized users accessed or whether any proprietary model weights, training data, or internal systems were compromised. Given that the Discord group told Bloomberg they were primarily interested in “playing around” with the technology rather than causing damage, the immediate risk of weaponization may be limited — but the precedent is set.
As AI models grow more capable, the stakes for securing them grow proportionally. The Mythos breach is a wake-up call: if Anthropic — a company whose entire brand is built on AI safety — cannot keep its most dangerous model under wraps, no organization can assume its own AI systems are safe from unauthorized access.
The question is no longer whether AI-powered cyberattacks will become reality. The question is whether defenders can adapt fast enough to meet them.
Key Sources
- Bloomberg: “Anthropic’s Mythos Model Is Being Accessed by Unauthorized Users”
- Fortune: “A group of users leaked Anthropic’s AI model Mythos by reportedly guessing where it was located” (April 23, 2026)
- The Guardian: “Anthropic investigates report of rogue access to hack-enabling Mythos AI” (April 22, 2026)
- The Verge: “Anthropic’s Mythos breach was humiliating”
- Cybernews: “Discord group accessed Anthropic’s Mythos without authorization”
- Tom’s Hardware: “How a cavalcade of blunders gave unauthorized users access to Claude Mythos”
- UK AI Security Institute (AISI) evaluation of Mythos capabilities
📖 Related: The $200 vs. $0 Showdown: Why Developers Are Ditching Claude Code for Open-Source Goose
📖 Related: Anthropic’s Mythos Breach Was Humiliating — And It Should Terrify the AI Industry
📖 Related: Salesforce Transforms Slackbot Into an AI-Powered Workplace Agent




