2026-06-16

Banned for fixing bugs

The Fable 5 export-control crisis dominated the day: new reporting shows the "jailbreak" that got Anthropic's top models pulled was just asking the model to fix vulnerable code, and 100+ security veterans want the order revoked. Around it, the business of AI ground on — Anthropic walked back a billing change, DeepSeek took its first outside money at $50B, and OpenAI's $34B burn leaked. Plus shipping news: Gemma 4 on Bedrock, GitHub renting AWS capacity, and a new "reassuringly hard" coding benchmark.

The Fable 5 'jailbreak' was just 'fix this code'

New reporting clarifies what triggered the U.S. Commerce Department's export-control order that forced Anthropic to pull Fable 5 and Mythos 5 offline worldwide. Per cybersecurity researcher Katie Moussouris, who reviewed the non-public Amazon paper, Fable refused to 'review the code for security issues' but complied when asked to 'fix this code' on deliberately vulnerable files — the find-fix-test loop defenders run daily, not a guardrail bypass. Over 100 security experts, including Alex Stamos, Jon Callas, and Rachel Tobac, signed an open letter calling the order dangerous and noting the same results reproduce on GPT-5.5, Opus 4.8, Sonnet, and Kimi 2.7. Axios sources frame the action as driven by 'personality differences' with the administration rather than a real technical threat.

Why it matters: Frontier-model access is now entangled with national-security process, not just evals — a single unilateral letter took the strongest coding/security models away from defenders overnight, and the precedent applies to any U.S. lab.

The Fable 5 Export Controls Harm US Cyber Defense (Simon Willison)
The US government's Anthropic models ban was never about an AI jailbreak (TechCrunch AI)
Cybersecurity vets protest 'dangerous' US government ban on Anthropic's most powerful models (TechCrunch AI)
The US government may be asking Anthropic the impossible by demanding unhackable LLMs (The Decoder)
"They screwed us": Personality clashes sent Anthropic's models offline (Simon Willison)
Quoting Matteo Wong, The Atlantic (Simon Willison)

Anthropic kills its Agent SDK billing overhaul before it ships

Anthropic paused a billing change slated for June 15 that would have stopped the Agent SDK, claude -p, and third-party apps from drawing on regular subscription limits. The plan gave each plan a fixed monthly credit ($20 Pro up to $200 Enterprise) before falling back to usage-based API pricing. 'Nothing changes for now,' the company says. The reversal follows April's move to bar third-party tools like OpenClaw from subscription limits, and lands amid an IPO filing, the Fable export-control mess, and a reported OpenAI plan to slash API prices.

Why it matters: If you build on the Agent SDK or run agents through claude -p, your costs stay inside your subscription for now — but the reprieve looks tactical, tied to a looming price war rather than a change of heart.

Anthropic backs off unpopular billing overhaul as price war with OpenAI looms (The Decoder)

DeepSeek takes its first outside money at a $50B valuation

DeepSeek raised over 50 billion yuan (~$7.4B) in its first external round, valuing the company above $50B — up from talk of a $10B valuation in April. The structure is unusual: most investors put money into a limited partnership run by CEO Liang Wenfeng with no voting rights and a five-year lock-up, with only China's state AI fund investing directly. Tencent and CATL are among backers. DeepSeek made its 75% V4 Pro discount permanent, pricing roughly 11x cheaper on input and 35x cheaper on output than GPT-5.5.

Why it matters: The open-weights leader now has war-chest funding and aggressive permanent pricing — relevant if you're weighing self-hostable frontier models against the US closed-API incumbents, especially after the Fable shutdown.

DeepSeek takes outside money for the first time at a $50 billion valuation (The Decoder)

Gemma 4 lands on Bedrock with an OpenAI-compatible endpoint

AWS added Google DeepMind's Apache-2.0 Gemma 4 family to Bedrock: a 31B dense model, a 26B-A4B MoE (3.8B active), and a 5.1B E2B (2.3B effective). All support reasoning mode, native function calling, and text+image input, with 256K context on the larger two. Access is through the new bedrock-mantle endpoint, which speaks the OpenAI Chat Completions and Responses APIs — existing OpenAI SDK code switches by changing only the base URL and model ID. Artificial Analysis reports an Intelligence Index of 39 for the 31B, well above the 4B–40B open-weights median of 15.

Why it matters: An OpenAI-wire-compatible Bedrock endpoint plus open-weight models means low-friction migration and self-hosting flexibility — a hedge worth noting in a week defined by sudden model unavailability.

Introducing Gemma 4 models on Amazon Bedrock (AWS Machine Learning)

GitHub rents AWS capacity as agentic commits 14x in a year

Microsoft is adding AWS capacity to keep GitHub running after an AI-driven surge strained the platform, per Business Insider — awkward, given the 2018 plan to fold GitHub onto Azure by 2027. GitHub's COO says commits are on pace for 14 billion in 2026, up from 1 billion in 2025; its CTO says an October 2025 plan to add 10x capacity was revised to 30x by February. GitHub is now serving 40% of monolith traffic from Azure (up from 8% in February) while battling outages, and Azure itself remains capacity-constrained through 2026.

Why it matters: Agent-generated commits, PRs, and Actions runs are hammering infrastructure built for human-paced teams — a concrete reminder that the bottleneck for AI coding tools is increasingly the platform underneath, not the model.

Microsoft turns to AWS as GitHub faces AI capacity crunch (Hacker News)

Cognition's FrontierCode benchmark stumps Opus 4.8 at 13%

Cognition (makers of Devin) released FrontierCode, a coding benchmark hand-built by 20 open-source maintainers from multi-PR chains, grading for mergeability — correctness, test quality, scope discipline, style, and codebase conventions — not just passing tests. It has 150 tasks across three tiers. Claude Opus 4.8 leads the hardest 'Diamond' tier at just 13.4%, followed by GPT-5.5 (6.3%) and Opus 4.7 (5.2%); Fable reportedly hits ~30%. Jack Clark's Import AI also flagged Xiaomi's 1T-param MiMo-V2.5 hitting 1000 tokens/sec on a commodity 8-GPU node, and Sequent, a new alignment nonprofit raising $100–150M.

Why it matters: SWE-Bench is saturating; FrontierCode's mergeability-focused grading is a harder, more realistic signal of whether agents write code you'd actually merge — useful for benchmarking your own model picks.

Import AI 461: "Alignment is not on track"; FrontierCode; and synthetic research interns (Import AI (Jack Clark))

A vision-language model ran in orbit for the first time

Loft Orbital's YAM-9 satellite used Google DeepMind's Gemma 3 to identify areas of interest from natural-language queries on-orbit — the first reported use of a VLM in space. NASA JPL's NAVI-Orbital package acted as the harness, trimmed to fit limited memory, running on an Nvidia Jetson Orin AGX. Instead of dumping raw imagery to ground analysts, the satellite did its own triage in response to prompts like 'monitor this border and flag anything suspicious.' Planet Labs and Kepler are reportedly exploring similar edge deployments.

Why it matters: A concrete edge-deployment proof point: an off-the-shelf small VLM plus a stripped-down harness, doing useful triage on constrained hardware far from any data center.

A satellite just learned to find things on its own — here's what that means (TechCrunch AI)

Also worth a look

OpenAI burned through $34 billion last year (The Decoder)
Salesforce acquires AI customer service platform Fin (Intercom) for $3.6B (TechCrunch AI)
Nvidia joins AI debt boom with $20 billion bond sale (The Decoder)
Sarvam becomes India's newest AI unicorn with $234M round led by HCLTech (TechCrunch AI)
As AI agents become employees, NewCore emerges with $66M to give them identities (TechCrunch AI)
Cloudflare grows its AI team with talent from Ensemble AI (Cloudflare Blog)
AI Agent Failure Detection and Root Cause Analysis with Strands Evals (AWS Machine Learning)
Can Europe train a frontier AI model on the compute it owns? (euromesh) (Hacker News)
GitHub publishes an open multilingual repositories dataset (CC0) (GitHub Blog)
Power-flexible data centers: give the grid some flex (MIT Technology Review)