Key Takeaways
- Chinese state-sponsored hackers used Claude AI for large-scale cyber espionage
- Approximately 30 global targets including tech giants and government agencies compromised
- Attack achieved 80-90% autonomy with minimal human intervention
- Anthropic detected and disrupted the campaign in September 2025
Anthropic has uncovered and stopped what it describes as the world’s first major AI-driven cyber espionage campaign. The operation, attributed to Chinese state-sponsored hackers, used Anthropic’s Claude Code tool to infiltrate around 30 global targets across technology, finance, chemical manufacturing, and government sectors.
The discovery highlights how advanced AI systems are enabling sophisticated cyber threats that operate with minimal human oversight. While Anthropic’s rapid response prevented further damage, the incident demonstrates how innovation tools can be weaponized into autonomous attack systems.
How the AI-Powered Cyberattack Unfolded
The campaign leveraged Claude’s advanced intelligence, agency capabilities, and tool integration that have significantly evolved over the past year. Attackers initially jailbroken Claude by disguising malicious tasks as “defensive testing” for a fake cybersecurity company.
They systematically broke down harmful actions into harmless individual steps to avoid triggering safety protocols. This approach prevented the AI from understanding the full malicious context of its activities.
Multi-Phase Attack Strategy
In the initial reconnaissance phase, human operators selected targets and established an autonomous framework using Claude Code. The AI system scanned infrastructure at unprecedented speeds—processing thousands of requests per second—and identified high-value databases much faster than human hackers could achieve.
Subsequent phases involved Claude researching vulnerabilities, developing exploit code, harvesting credentials, and exfiltrating sensitive data. The operation required only 4-6 human check-ins per attack cycle, demonstrating remarkable autonomy.
“Models’ general levels of capability have increased to the point that they can follow complex instructions and understand context in ways that make very sophisticated tasks possible. Not only that, but several of their well-developed specific skills—in particular, software coding—lend themselves to being used in cyberattacks.” Anthropic stated in its report.
The AI even generated comprehensive post-attack documentation, systematically categorizing stolen intelligence by value and importance. Despite occasional hallucinations that produced fabricated credentials or misidentified public data as secrets, the operation maintained 80-90% autonomy—a level impossible for human-only teams.
Detection and Response
Anthropic’s Threat Intelligence team, utilizing Claude for their own analysis, mapped the entire threat landscape over 10 days. Their response included banning compromised accounts, notifying affected victims, and coordinating with relevant authorities.
The company emphasized its commitment to transparency in cybersecurity threats, stating they will continue publishing similar reports to help industry, government, and research communities strengthen their defenses.
“Our goal is for Claude—into which we’ve built strong safeguards—to assist cybersecurity professionals to detect, disrupt, and prepare for future versions of the attack,” says Anthropic in the report.
The incident represents a significant milestone in AI security, highlighting both the potential benefits and risks of advanced AI systems in cybersecurity. As continue to evolve, organizations must adapt their defense strategies accordingly.



