Technology Edition · 175
Decoding Data Science & AI
The era of heavily subsidized “unlimited” generative computing is ending abruptly. A metered infrastructure model is taking its place — as autonomous agents move from experimental to fully execution-capable, and sometimes dangerously volatile, real-world systems.
This Week’s Highlights
World’s First Fully Autonomous LLM Agent Cyberattack — Marimo Compromised
An autonomous LLM agent executed a complete network intrusion with zero human intervention — exploiting CVE-2026-39987 in Marimo. It harvested AWS credentials, initiated 8 parallel SSH sessions, and fully exfiltrated a production PostgreSQL database in under an hour. The DB reconnaissance phase took less than 2 minutes.
Google Launches Gemma 4 — Edge Multimodality & MoE Open-Sourced under Apache 2.0
Four model sizes spanning E2B to 31B Dense. The 26B MoE activates only 3.8B parameters per token yet scores 88.3% on AIME 2026 maths (up from 20.8% on Gemma 3 27B) and 77.1% on LiveCodeBench v6 — world-class results at lightweight inference cost, freely available commercially.
End of “Unlimited” AI Coding — GitHub Copilot & OpenAI Codex Go Metered
Effective June 1, 2026: all GitHub Copilot plans migrate to strict credit-based token billing. Concurrently, OpenAI’s ChatGPT Pro 2× promotion expired, halving effective Codex limits. Industry data reveals flat-rate plans were subsidising automated agent usage by 15–30× versus actual API run costs.
Anthropic Splits Billing & Signs 300 MW SpaceX Compute Deal
From June 15, Anthropic separates interactive and automated agent usage pools. Automated agents (Claude SDK, headless Claude Code, GitHub Actions) draw from a separate allowance: ~$20/mo Pro, ~$200/mo Max. A SpaceX Colossus 1 partnership brings 220,000 NVIDIA GPUs online, doubling Claude Code’s 5-hour rate limits.
UAE Launches World’s First “Agentic AI Government” + 80,000-Staff Training
The UAE Cabinet approved migration of 50% of all government services to autonomous agentic AI by 2028. A strategic MBZUAI partnership will certify 80,000 federal employees as Agentic AI experts — moving from awareness training to hands-on deployment, auditing, and real-world agent system management.
Chinese Open-Weight Models Surge to 45% of Global Agentic Traffic on OpenRouter
Per JPMorgan’s strategist Michael Cembalest: Chinese AI models went from under 2% of OpenRouter traffic in late 2024 to over 45% by mid-2026. MiniMax M2.5 matches Claude Opus 4.6 on SWE-Bench (80.2% vs 80.8%) at just $0.30/M tokens — 17–30× cheaper than comparable Western flagships.
Shanghai Futures Exchange Designs AI Token Futures — Tokens Become a Commodity
The Shanghai Futures Exchange is building financial futures contracts tied directly to AI tokens. China’s daily token consumption has surged 1,000× since 2024, exceeding 140 trillion tokens/day. Like jet-fuel futures for airlines, software companies will soon hedge multi-year AI spend via token derivative contracts.
Anatomy of the First Autonomous LLM Cyberattack
CVE-2026-39987 — a pre-authentication RCE flaw in Marimo’s WebSocket interface — was exploited end-to-end by an LLM agent with zero human input. The leaked planning token — “see what else we can do” — is characteristic of LLM step-by-step reasoning, serving as machine-readable evidence of autonomous agency.
| Old Approach | Why It Fails Against LLM Agents | New Requirement |
|---|---|---|
| Signature-based detection | Agents dynamically rewrite command syntax based on shell feedback — no fixed pattern to match | Behavioral telemetry: credential access anomalies, lateral movement, egress spikes |
| Open notebook environments | Interactive notebooks treated as low-risk provide perfect foothold for agent exploration | Containerized sandboxes (GKE Agent Sandbox) with strict network egress routing |
| Broad API credential scopes | Agent harvested AWS credentials then chained to Secrets Manager — broad scope enabled full compromise | Strict token-scoping on all local and cloud environments where agents are tested |
Gemma 4 — Architecture & Benchmark Analysis
The 26B MoE model uses 128 total expert networks, activating only 8 experts + 1 shared expert per token — running at the memory footprint of a 4B dense model while delivering frontier-level knowledge. Released under Apache 2.0: full commercial use, no restrictions.
| Model | Type | Active Params | Best For |
|---|---|---|---|
| E2B | Effective | ~2B | Mobile / IoT |
| E4B | Effective | ~4B | Edge / Jetson / Pi |
| 26B MoE | Sparse | 3.8B active | Agentic / Coding |
| 31B Dense | Dense | 31B active | Max reasoning |
| Benchmark | Gemma 3 27B | Gemma 4 26B MoE |
|---|---|---|
| AIME 2026 | 20.8% | 88.3% |
| LiveCodeBench v6 | — | 77.1% |
| Context Window | 128K | 256K |
| Tool Calling | External | Native |
The End of Unlimited AI — Billing Transition
Flat-rate subscriptions were subsidising automated agent usage by 15–30× relative to actual API run costs. Agentic workflows consume massive output tokens — recursive loops, codebase indexing, and automated debugging dwarf standard chat queries. The economics are now being corrected simultaneously across the industry.
| Provider | Change | Effective Date | Impact | Automated Agent Allowance |
|---|---|---|---|---|
| GitHub Copilot | Flat-rate → credit-based tokens | June 1, 2026 | All plans affected; prices unchanged | Credit pool (metered) |
| OpenAI Codex Pro | 2× promo expired | May 31, 2026 | Effective usage halved overnight | $100/mo standard cap |
| Anthropic Pro | Interactive/automated split | June 15, 2026 | Automated agents billed separately | ~$20/mo |
| Anthropic Max | Interactive/automated split | June 15, 2026 | Rate limits doubled via SpaceX GPUs | ~$200/mo |
Audit all CI/CD pipelines, cron jobs, and automated scripts before June 15. Calculate your monthly token velocity for heavy reasoning loops.
Use local models (Gemma 4 E4B) for routine syntax edits. Implement aggressive prompt caching. Restrict context windows to active workspace — never pass entire codebases.
For high-volume automation, migrate from subscription accounts to dedicated API keys to prevent critical builds failing mid-month.
Result: 2× Claude Code rolling rate limits, peak-hour throttling removed, Claude Opus API limits dramatically raised.
UAE — World’s First Agentic AI Government
Unlike basic chatbot deployments, the UAE’s agentic systems are authorized to plan, call APIs, access administrative databases, and execute multi-step government workflows with minimal human oversight. This creates both an extraordinary professional opportunity and a rigorous engineering responsibility for the region’s developer community.
Sheikh Hamdan bin Mohammed’s private-sector directive, supported by dedicated digital incubators and development funds, extends the transformation beyond government — signalling a whole-economy shift to agentic-first operations within two years.
Skills now in critical demand: agent state management, deterministic routing, multi-agent orchestration frameworks, rigorous audit logging, and responsible agentic transformation certification.
Chinese Model Surge — The OpenRouter Shift
Chinese open-weight models surged from under 2% → 45%+ of global developer traffic on OpenRouter — the world’s largest LLM aggregation platform serving 5M+ developers. Usage is disproportionately concentrated in high-volume agentic flows where price-per-token is the dominant decision variable.
| Model | Origin | SWE-Bench Score | Price / 1M Tokens | vs. Claude Opus 4.6 |
|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic (US) | 80.8% | ~$500.00 | Baseline |
| MiniMax M2.5 | MiniMax (Shanghai) | 80.2% | $0.30 | ~1,667× cheaper |
| Kimi K2.5 | Moonshot AI (China) | — | ~$0.15 (input) | Top-3 OpenRouter |
| GLM-5 | Zhipu AI (China) | — | Competitive | Top-3 OpenRouter |
Token Futures — AI Compute Becomes a Commodity
The Shanghai Futures Exchange designs derivatives tied to the AI token directly — treating compute’s fundamental digital fuel as the traded unit, not the hardware that generates it.
CME Group and ICE design futures tied to GPU server rental time — a physical-hardware model. The structural divergence mirrors oil vs electricity futures — both valid, fundamentally different hedging instruments.
The analogy: Airlines purchase jet-fuel futures to protect profit margins against oil price spikes. Software companies will soon purchase token futures contracts to lock in API costs for multi-year contracts. For technology leaders, understanding token economics and compute hedging is becoming as vital as choosing the correct model architecture.
Key Takeaways for Professionals
Strategic Data Reference
| Metric | Late 2024 / Early 2025 | June 2026 | Shift |
|---|---|---|---|
| Chinese model share on OpenRouter | <2% | >45% | +2,150%+ |
| China daily token consumption | ~140B/day | 140T/day | +1,000× |
| Gemma AIME 2026 score | 20.8% (Gemma 3) | 88.3% (Gemma 4 MoE) | +325% |
| GitHub Copilot billing model | Flat-rate unlimited | Metered credits | Paradigm shift |
| Agent subsidy ratio (flat-rate) | 15–30× actual cost | Corrected to actual | Subsidy removed |
| UAE federal staff in Agentic AI | Minimal | 80,000 (target) | World first |
With GitHub Copilot and Anthropic migrating automated workflows to metered credit billing, how is your organisation auditing its current agentic pipeline costs? Are you planning to migrate high-volume tasks to cost-efficient open-weight models like Gemma 4 or MiniMax M2.5? Share your strategy in the comments below.
great