=== TAG ===
AI Models
=== HEADLINE ===
DeepSeek V4-Pro Just Tied Claude Opus On SWE-Bench
=== META_DESC ===
DeepSeek V4-Pro hit 80.6% on SWE-bench Verified, within 0.2 points of Claude Opus 4.6, but it costs $1.74/$3.48 per million tokens against Claude's $5/$25 and the weights are MIT licensed. The coding frontier just became a commodity.
=== DATE ===
April 26, 2026
=== AUTHOR ===
Jane Sterling
=== READ_TIME ===
9-minute read
=== HERO_IMG ===
img/content.png
=== SCRIPT_LABEL ===
Video Script (9 min, clean transcript for captioning)
=== SCRIPT ===
A Chinese AI lab just shipped a model that matches Claude Opus 4.6 on the toughest coding benchmark in the industry. They released the weights for free under MIT license. They priced the API at one seventh of what Anthropic charges. And they did it on a Friday afternoon while most of Silicon Valley was already heading into the weekend.

DeepSeek V4-Pro went live on April 24th, 2026. The official announcement landed on the DeepSeek developer news feed at the same hour Hugging Face turned on the model card. The Pro model carries 1.6 trillion total parameters with 49 billion active per token in a mixture of experts arrangement. The Flash sibling carries 284 billion total with 13 billion active. Both are MIT licensed, both ship with a 1 million token context window as the default, and both can be downloaded immediately.

If you have followed the coding agent race for the last twelve months, that 1.6 trillion number should make you sit up. Anthropic's Claude Opus 4.6 had been the unchallenged king of agentic coding work since the start of the year. It scored 80.8 percent on SWE-bench Verified, the Princeton benchmark that asks a model to actually fix real GitHub issues end to end. Every closed lab in the world has been chasing that number. DeepSeek V4-Pro hit 80.6 percent. That is two tenths of a single percentage point behind a model you cannot self host, you cannot fine tune, and you cannot run inside an air gapped network. V4-Pro you can do all three with.

This is not a benchmark where the gap closes by accident. SWE-bench Verified is the eval that broke open the agentic coding category in the first place. When Cognition demoed Devin, the headline number was a SWE-bench score. When Anthropic launched Claude 3.5 Sonnet for coding, the headline was SWE-bench. It is the test the entire industry has agreed measures the thing that actually matters, which is whether you can hand the model a real bug and walk away. For an open weights model to pull within 0.2 points of the closed leader is a structural event. It is the moment the coding frontier became a commodity.

And that brings us to the price tag, because the price tag is the part that should keep every closed API CFO awake tonight. Claude Opus 4.6 charges 5 dollars per million input tokens and 25 dollars per million output tokens. DeepSeek V4-Pro charges 1 dollar 74 cents per million input on cache miss, 14 and a half cents per million on cache hit, and 3 dollars 48 cents per million output. That is roughly one seventh the marginal cost of Claude for the same coding work, and you can host it yourself if you do not even want to pay that. If you are running a coding agent at scale, the math just changed.

That is the headline. Now let us look at what is actually inside.

The benchmark sweep DeepSeek published with the release reads like a competitive teardown of the closed frontier. On LiveCodeBench, V4-Pro hit 93.5 percent, ahead of Claude at 88.8 and Gemini 3.1 Pro at 91.7. On Codeforces, the algorithmic competition rating, V4-Pro pulled a 3206, ahead of GPT-5.4 at 3168 and Gemini at 3052. On Terminal-Bench 2.0, the eval that measures how well an agent can drive a real Linux shell and recover from its own mistakes, V4-Pro scored 67.9 percent against Claude's 65.4. Three coding evals. V4-Pro leads on all three.

The story is not all one sided. On Humanity's Last Exam, the ferociously hard reasoning benchmark, V4-Pro hit 37.7 percent against Claude's 40, GPT-5.4 at 39.8, and Gemini at 44.4. On SimpleQA Verified, the world knowledge eval, V4-Pro scored 57.9 against Gemini at 75.6. On HMMT 2026, the math competition benchmark, V4-Pro hit 95.2 against Claude at 96.2 and GPT-5.4 at 97.7. So V4-Pro is the new coding frontier. It is not the new everything frontier. World knowledge and the hardest reasoning still belong to Gemini and the closed labs, by margins that matter.

But coding is the use case that pays the bills right now. Cursor, Replit, GitHub Copilot, every YC backed coding agent startup, Anthropic's own Claude Code product, all of them depend on a frontier coding model and all of them have been paying frontier prices to get one. V4-Pro just made that economic moat much shallower.

How did DeepSeek pull this off. The technical details matter. V4-Pro was trained on 33 trillion tokens. It uses a mixture of FP8 and FP4 precision, which is far more aggressive than the FP16 most US labs train at. DeepSeek says memory consumption is 9.5 to 13.7 times lower than V3.2, which is the kind of efficiency claim that, if it holds up under independent reproduction, is going to force a rethink across the industry. They also validated the model on Huawei Ascend NPUs alongside Nvidia GPUs. That last detail is the geopolitical tell. DeepSeek is openly building a chip supply chain that does not need TSMC or Nvidia for inference.

The market noticed within hours. Shares of SMIC, the Chinese chipmaker that fabs the Ascend silicon, jumped 10 percent on the Hong Kong open. Shares of MiniMax and Knowledge Atlas, two Chinese closed model competitors, fell more than 9 percent. The pricing pressure does not just hit Anthropic and OpenAI. It hits every closed model lab in the world, and the closed Chinese labs are getting hit first because they cannot retreat behind a regulatory moat.

Michael Kratsios, the US science advisor, came out the same day with a statement calling Chinese AI development industrial scale copying of American work, saying there is nothing innovative about systematically extracting and copying. China's foreign ministry called the accusation groundless and a smear. Both statements were filed within 24 hours of the V4-Pro weights going public. The political response is now baked in.

So what does it mean when an open weights model ties the closed coding frontier at one seventh the price.

It means three things, and they all happen at once.

First, the coding agent business model just became a margin trap for anyone running closed weights. If you are Cursor charging twenty dollars a month per developer seat, the cost of inference under the hood used to be your moat. Now your competitor can self host V4-Pro on rented Ascend or H100 capacity and undercut you on the unit economics while delivering a coding model that ties yours on the eval that matters. Cursor's CEO has been aggressive about model agnosticism precisely because his team saw this coming. Anthropic, by contrast, has built its enterprise pitch on the premise that Opus is uniquely good at coding. That premise just got a lot harder to defend.

Second, the export control conversation gets more complicated, not simpler. The US government has spent two years tightening the rules around shipping advanced GPUs to China on the theory that compute scarcity would slow down the Chinese frontier. V4-Pro's training run almost certainly involved smuggled or stockpiled H100 capacity, but the model itself runs efficiently on Ascend silicon for inference. That means the bottleneck the export controls were designed to create, which was the inference layer, is the layer DeepSeek just routed around. Expect the US response to escalate, possibly toward open weights distribution itself, which is a fight nobody in the open source community wants to have.

Third, and this is the one that takes longer to play out, the safety story for frontier AI just got messier. Anthropic, OpenAI, and Google have spent years arguing that their closed models can be more rigorously evaluated, red teamed, and gated than open ones. That argument depends on the closed models actually being meaningfully better at the things that matter. V4-Pro is a 1.6 trillion parameter model, MIT licensed, that you can fine tune on whatever dataset you want, including ones the original lab would refuse. The capability gap that justified gating just narrowed. The next time a frontier capability emerges, the open weights version is going to be days or weeks behind, not months.

There are still real reasons to be skeptical. Early reviewers said benchmarks do not tell the full story, and they do not. V4-Pro is weaker on world knowledge and the hardest reasoning. Production reliability under sustained agent workloads is unproven. Anthropic has years of red team data and incident response infrastructure that DeepSeek simply does not. Choosing V4-Pro for a regulated workload today is not the same as choosing Claude.

But for the median coding task, which is the use case driving the entire AI economy right now, the gap is gone. The coding frontier is open, MIT licensed, and one seventh the price. That is the part that does not get walked back.

If you are a developer, you will be testing V4-Pro inside your editor by next week. If you are a closed model lab, your pricing committee is meeting on Monday. And if you are wondering how the next twelve months of the AI race look, this is the floor. Open weights at frontier. From here, the only direction is down on price and up on capability. I am Jane Sterling. This was Sterling Intelligence.
=== SCRIPT_HTML ===

=== ANNOTATED_LABEL ===
Annotated Script (with b-roll &amp; cut cues)
=== ANNOTATED_HTML ===
<span class="cue">[TALKING HEAD — hook]</span>
<p>A Chinese AI lab just shipped a model that matches Claude Opus 4.6 on the toughest coding benchmark in the industry. They released the weights for free under MIT license. They priced the API at one seventh of what Anthropic charges. And they did it on a Friday afternoon while most of Silicon Valley was already heading into the weekend.</p>
<span class="cue">[CUT]</span>
<span class="cue">[VOICEOVER — scene 1]</span>
<span class="cue">[B-ROLL: company-logo:DeepSeek]</span>
<p>DeepSeek V4-Pro went live on April 24th, 2026. The official announcement landed on the DeepSeek developer news feed at the same hour Hugging Face turned on the model card.</p>
<span class="cue">[B-ROLL: screen-capture:DeepSeek api-docs news page]</span>
<span class="cue">[STAT CARD: "1.6T total params · 49B active"]</span>
<span class="cue">[STAT CARD: "284B total · 13B active (V4-Flash)"]</span>
<span class="cue">[STAT CARD: "1M token context, MIT license"]</span>
<p>The Pro model carries 1.6 trillion total parameters with 49 billion active per token in a mixture of experts arrangement. The Flash sibling carries 284 billion total with 13 billion active. Both are MIT licensed, both ship with a 1 million token context window as the default, and both can be downloaded immediately.</p>
<span class="cue">[B-ROLL: ai-abstract]</span>
<span class="cue">[STAT CARD: "Claude Opus 4.6: 80.8% SWE-bench Verified"]</span>
<span class="cue">[STAT CARD: "DeepSeek V4-Pro: 80.6% SWE-bench Verified"]</span>
<p>If you have followed the coding agent race for the last twelve months, that 1.6 trillion number should make you sit up. Anthropic's Claude Opus 4.6 had been the unchallenged king of agentic coding work since the start of the year. It scored 80.8 percent on SWE-bench Verified, the Princeton benchmark that asks a model to actually fix real GitHub issues end to end. Every closed lab in the world has been chasing that number. DeepSeek V4-Pro hit 80.6 percent. That is two tenths of a single percentage point behind a model you cannot self host, you cannot fine tune, and you cannot run inside an air gapped network. V4-Pro you can do all three with.</p>
<span class="cue">[B-ROLL: code-terminal]</span>
<span class="cue">[B-ROLL: stills:Devin demo screen]</span>
<p>This is not a benchmark where the gap closes by accident. SWE-bench Verified is the eval that broke open the agentic coding category in the first place. When Cognition demoed Devin, the headline number was a SWE-bench score. When Anthropic launched Claude 3.5 Sonnet for coding, the headline was SWE-bench. It is the test the entire industry has agreed measures the thing that actually matters, which is whether you can hand the model a real bug and walk away. For an open weights model to pull within 0.2 points of the closed leader is a structural event. It is the moment the coding frontier became a commodity.</p>
<span class="cue">[STAT CARD: "Claude Opus 4.6: $5 / $25 per M tokens"]</span>
<span class="cue">[STAT CARD: "V4-Pro: $1.74 / $3.48 per M tokens"]</span>
<span class="cue">[STAT CARD: "$0.145 per M input on cache hit"]</span>
<span class="cue">[STAT CARD: "~1/7th the marginal cost of Claude"]</span>
<p>And that brings us to the price tag, because the price tag is the part that should keep every closed API CFO awake tonight. Claude Opus 4.6 charges 5 dollars per million input tokens and 25 dollars per million output tokens. DeepSeek V4-Pro charges 1 dollar 74 cents per million input on cache miss, 14 and a half cents per million on cache hit, and 3 dollars 48 cents per million output. That is roughly one seventh the marginal cost of Claude for the same coding work, and you can host it yourself if you do not even want to pay that. If you are running a coding agent at scale, the math just changed.</p>
<span class="cue">[/VOICEOVER]</span>
<span class="cue">[TALKING HEAD — transition]</span>
<p>That is the headline. Now let us look at what is actually inside.</p>
<span class="cue">[CUT]</span>
<span class="cue">[VOICEOVER — scene 2]</span>
<span class="cue">[B-ROLL: finance-charts]</span>
<span class="cue">[STAT CARD: "LiveCodeBench: V4-Pro 93.5% · Claude 88.8% · Gemini 91.7%"]</span>
<span class="cue">[STAT CARD: "Codeforces: V4-Pro 3206 · GPT-5.4 3168 · Gemini 3052"]</span>
<span class="cue">[STAT CARD: "Terminal-Bench 2.0: V4-Pro 67.9% · Claude 65.4%"]</span>
<p>The benchmark sweep DeepSeek published with the release reads like a competitive teardown of the closed frontier. On LiveCodeBench, V4-Pro hit 93.5 percent, ahead of Claude at 88.8 and Gemini 3.1 Pro at 91.7. On Codeforces, the algorithmic competition rating, V4-Pro pulled a 3206, ahead of GPT-5.4 at 3168 and Gemini at 3052. On Terminal-Bench 2.0, the eval that measures how well an agent can drive a real Linux shell and recover from its own mistakes, V4-Pro scored 67.9 percent against Claude's 65.4. Three coding evals. V4-Pro leads on all three.</p>
<span class="cue">[B-ROLL: data-center]</span>
<span class="cue">[STAT CARD: "HLE: V4-Pro 37.7% · Claude 40% · GPT-5.4 39.8% · Gemini 44.4%"]</span>
<span class="cue">[STAT CARD: "SimpleQA Verified: V4-Pro 57.9% · Gemini 75.6%"]</span>
<span class="cue">[STAT CARD: "HMMT 2026: V4-Pro 95.2% · Claude 96.2% · GPT-5.4 97.7%"]</span>
<p>The story is not all one sided. On Humanity's Last Exam, the ferociously hard reasoning benchmark, V4-Pro hit 37.7 percent against Claude's 40, GPT-5.4 at 39.8, and Gemini at 44.4. On SimpleQA Verified, the world knowledge eval, V4-Pro scored 57.9 against Gemini at 75.6. On HMMT 2026, the math competition benchmark, V4-Pro hit 95.2 against Claude at 96.2 and GPT-5.4 at 97.7. So V4-Pro is the new coding frontier. It is not the new everything frontier. World knowledge and the hardest reasoning still belong to Gemini and the closed labs, by margins that matter.</p>
<span class="cue">[B-ROLL: company-logo:Cursor]</span>
<span class="cue">[B-ROLL: company-logo:GitHub Copilot]</span>
<p>But coding is the use case that pays the bills right now. Cursor, Replit, GitHub Copilot, every YC backed coding agent startup, Anthropic's own Claude Code product, all of them depend on a frontier coding model and all of them have been paying frontier prices to get one. V4-Pro just made that economic moat much shallower.</p>
<span class="cue">[B-ROLL: code-terminal]</span>
<span class="cue">[STAT CARD: "33T training tokens"]</span>
<span class="cue">[STAT CARD: "Mixed FP8 / FP4 precision"]</span>
<span class="cue">[STAT CARD: "9.5x to 13.7x less memory than V3.2"]</span>
<p>How did DeepSeek pull this off. The technical details matter. V4-Pro was trained on 33 trillion tokens. It uses a mixture of FP8 and FP4 precision, which is far more aggressive than the FP16 most US labs train at. DeepSeek says memory consumption is 9.5 to 13.7 times lower than V3.2, which is the kind of efficiency claim that, if it holds up under independent reproduction, is going to force a rethink across the industry. They also validated the model on Huawei Ascend NPUs alongside Nvidia GPUs. That last detail is the geopolitical tell. DeepSeek is openly building a chip supply chain that does not need TSMC or Nvidia for inference.</p>
<span class="cue">[B-ROLL: finance-charts]</span>
<span class="cue">[STAT CARD: "SMIC +10% on Hong Kong open"]</span>
<span class="cue">[STAT CARD: "MiniMax / Knowledge Atlas −9%"]</span>
<p>The market noticed within hours. Shares of SMIC, the Chinese chipmaker that fabs the Ascend silicon, jumped 10 percent on the Hong Kong open. Shares of MiniMax and Knowledge Atlas, two Chinese closed model competitors, fell more than 9 percent. The pricing pressure does not just hit Anthropic and OpenAI. It hits every closed model lab in the world, and the closed Chinese labs are getting hit first because they cannot retreat behind a regulatory moat.</p>
<span class="cue">[B-ROLL: news-studio]</span>
<span class="cue">[B-ROLL: stills:Michael Kratsios White House]</span>
<p>Michael Kratsios, the US science advisor, came out the same day with a statement calling Chinese AI development industrial scale copying of American work, saying there is nothing innovative about systematically extracting and copying. China's foreign ministry called the accusation groundless and a smear. Both statements were filed within 24 hours of the V4-Pro weights going public. The political response is now baked in.</p>
<span class="cue">[/VOICEOVER]</span>
<span class="cue">[TALKING HEAD — transition]</span>
<p>So what does it mean when an open weights model ties the closed coding frontier at one seventh the price.</p>
<p>It means three things, and they all happen at once.</p>
<span class="cue">[CUT]</span>
<span class="cue">[VOICEOVER — scene 3]</span>
<span class="cue">[B-ROLL: company-logo:Cursor]</span>
<span class="cue">[B-ROLL: company-logo:Anthropic]</span>
<p>First, the coding agent business model just became a margin trap for anyone running closed weights. If you are Cursor charging twenty dollars a month per developer seat, the cost of inference under the hood used to be your moat. Now your competitor can self host V4-Pro on rented Ascend or H100 capacity and undercut you on the unit economics while delivering a coding model that ties yours on the eval that matters. Cursor's CEO has been aggressive about model agnosticism precisely because his team saw this coming. Anthropic, by contrast, has built its enterprise pitch on the premise that Opus is uniquely good at coding. That premise just got a lot harder to defend.</p>
<span class="cue">[B-ROLL: courtroom]</span>
<span class="cue">[B-ROLL: stills:Commerce Department export controls]</span>
<p>Second, the export control conversation gets more complicated, not simpler. The US government has spent two years tightening the rules around shipping advanced GPUs to China on the theory that compute scarcity would slow down the Chinese frontier. V4-Pro's training run almost certainly involved smuggled or stockpiled H100 capacity, but the model itself runs efficiently on Ascend silicon for inference. That means the bottleneck the export controls were designed to create, which was the inference layer, is the layer DeepSeek just routed around. Expect the US response to escalate, possibly toward open weights distribution itself, which is a fight nobody in the open source community wants to have.</p>
<span class="cue">[B-ROLL: ai-abstract]</span>
<span class="cue">[B-ROLL: code-terminal]</span>
<p>Third, and this is the one that takes longer to play out, the safety story for frontier AI just got messier. Anthropic, OpenAI, and Google have spent years arguing that their closed models can be more rigorously evaluated, red teamed, and gated than open ones. That argument depends on the closed models actually being meaningfully better at the things that matter. V4-Pro is a 1.6 trillion parameter model, MIT licensed, that you can fine tune on whatever dataset you want, including ones the original lab would refuse. The capability gap that justified gating just narrowed. The next time a frontier capability emerges, the open weights version is going to be days or weeks behind, not months.</p>
<span class="cue">[B-ROLL: transition]</span>
<p>There are still real reasons to be skeptical. Early reviewers said benchmarks do not tell the full story, and they do not. V4-Pro is weaker on world knowledge and the hardest reasoning. Production reliability under sustained agent workloads is unproven. Anthropic has years of red team data and incident response infrastructure that DeepSeek simply does not. Choosing V4-Pro for a regulated workload today is not the same as choosing Claude.</p>
<p>But for the median coding task, which is the use case driving the entire AI economy right now, the gap is gone. The coding frontier is open, MIT licensed, and one seventh the price. That is the part that does not get walked back.</p>
<span class="cue">[/VOICEOVER]</span>
<span class="cue">[TALKING HEAD — sign-off]</span>
<p>If you are a developer, you will be testing V4-Pro inside your editor by next week. If you are a closed model lab, your pricing committee is meeting on Monday. And if you are wondering how the next twelve months of the AI race look, this is the floor. Open weights at frontier. From here, the only direction is down on price and up on capability. I am Jane Sterling. This was Sterling Intelligence.</p>
=== ARTICLE_HTML ===
<p>On April 24, 2026, DeepSeek released V4-Pro and V4-Flash under the MIT License. The headline number is the one Anthropic does not want printed: 80.6 percent on SWE-bench Verified, two tenths of a point behind Claude Opus 4.6, at roughly one seventh the marginal cost.</p>
<p>That gap, on the benchmark the entire coding agent industry has agreed defines the frontier, is now small enough to be a rounding error. The model is open weights, downloadable from Hugging Face on day one, and validated for inference on both Nvidia and Huawei Ascend silicon.</p>
<p>The implication is structural. The closed frontier coding model was the highest margin product in the AI stack. It is not anymore.</p>
<hr class="section-rule">
<h2 class="article-h2">What DeepSeek Released</h2>
<p>V4-Pro carries 1.6 trillion total parameters with 49 billion active per forward pass. V4-Flash carries 284 billion total with 13 billion active. Both use a mixture of experts architecture, both ship with a 1 million token context window by default, and both released under MIT License simultaneously through DeepSeek's API, Hugging Face, and a public web service.</p>
<p>The training run consumed 33 trillion tokens. DeepSeek's own technical report describes a mixed FP8 and FP4 precision scheme, with memory consumption claimed at 9.5 to 13.7 times lower than V3.2, the previous generation.</p>
<hr class="section-rule">
<h2 class="article-h2">The Coding Benchmarks</h2>
<p>On SWE-bench Verified, V4-Pro scored 80.6 percent against Claude Opus 4.6's 80.8 percent. V4-Flash scored 79.0 percent. On LiveCodeBench, V4-Pro hit 93.5 percent, ahead of Claude at 88.8 and Gemini 3.1 Pro at 91.7. On Codeforces, V4-Pro pulled a 3206 rating, ahead of GPT-5.4 at 3168 and Gemini at 3052. On Terminal-Bench 2.0, V4-Pro scored 67.9 percent against Claude's 65.4.</p>
<p>Across the four headline coding evals, V4-Pro leads on three and ties on the fourth. For an open weights model, that is unprecedented.</p>
<hr class="section-rule">
<h2 class="article-h2">Where The Closed Models Still Win</h2>
<p>On Humanity's Last Exam, the hardest published reasoning benchmark, V4-Pro scored 37.7 percent against Claude's 40, GPT-5.4 at 39.8, and Gemini 3.1 Pro at 44.4. On SimpleQA Verified, the world knowledge eval, V4-Pro hit 57.9 against Gemini's 75.6. On HMMT 2026, the math competition benchmark, V4-Pro scored 95.2 against Claude at 96.2 and GPT-5.4 at 97.7.</p>
<p>The closed labs still hold the lead in pure reasoning, world knowledge, and the hardest math. The gap is real but narrow, and it is not the part of the model that drives revenue at Cursor, Replit, or GitHub Copilot.</p>
<hr class="section-rule">
<h2 class="article-h2">The Pricing</h2>
<p>V4-Pro is priced at $1.74 per million input tokens on cache miss, $0.145 per million on cache hit, and $3.48 per million output tokens. V4-Flash is $0.028 per million input on cache hit and $0.28 per million output. Claude Opus 4.6 is priced at $5 per million input and $25 per million output. GPT-5.5 is $5 input and $30 output.</p>
<p>For coding workloads at scale, V4-Pro's marginal cost lands at roughly one seventh of Claude Opus 4.6 and one ninth of GPT-5.5. Self hosting on rented capacity drives the marginal cost lower still.</p>
<hr class="section-rule">
<h2 class="article-h2">Market Reaction</h2>
<p>Within hours of the release, SMIC shares jumped 10 percent on the Hong Kong open. MiniMax and Knowledge Atlas, two Chinese closed model competitors, fell more than 9 percent. Huawei announced full Ascend processor support for the model the same day. Western coding agent companies have not yet posted formal responses, but observers expect repricing within weeks.</p>
<hr class="section-rule">
<h2 class="article-h2">The Geopolitical Layer</h2>
<p>Michael Kratsios, the White House science advisor, described Chinese AI development as industrial scale copying of American work, saying there is nothing innovative about systematically extracting and copying. China's foreign ministry called the allegations groundless and a smear against the achievements of China's AI industry.</p>
<p>The release also signals a closer integration between DeepSeek and Huawei's Ascend silicon roadmap. DeepSeek validated V4-Pro for inference on Ascend NPUs alongside Nvidia GPUs, sidestepping the chokepoint US export controls were designed to create.</p>
<hr class="section-rule">
<h2 class="article-h2">What It Means For Anthropic and OpenAI</h2>
<p>The two labs that built their enterprise pitch on coding superiority face the most direct exposure. Anthropic in particular has positioned Claude Opus and Claude Code around the premise of a unique coding edge. That premise is now defensible only on closed source advantages: red team maturity, production reliability, vendor support, and integration depth. Capability parity on benchmarks is no longer the moat.</p>
<p>Expect price compression at the top of the closed API stack within the quarter, plus accelerated investment in coding agent products that monetize beyond raw token cost.</p>
<hr class="section-rule">
<h2 class="article-h2">What To Watch Next</h2>
<p>Three signals will determine how durable this moment is. First, independent reproductions of the SWE-bench Verified score over the next two weeks. Second, whether US export control policy expands to cover open weights distribution itself, an escalation that would touch Hugging Face directly. Third, whether the next Anthropic or OpenAI model release reclaims a clear lead on the coding benchmarks or whether the gap holds.</p>
<ul class="article-tags">
  <li>DeepSeek</li>
  <li>V4Pro</li>
  <li>OpenSource</li>
  <li>SWEBench</li>
  <li>ClaudeOpus</li>
  <li>CodingAgents</li>
  <li>MITLicense</li>
  <li>Huawei</li>
  <li>ExportControls</li>
  <li>AIBenchmarks</li>
  <li>FrontierAI</li>
  <li>ChinaAI</li>
</ul>
=== YOUTUBE_DESC ===
A Chinese open source AI just tied Claude Opus 4.6 on the toughest coding benchmark in the industry. And it costs one seventh as much.

DeepSeek V4-Pro launched April 24, 2026 under MIT License. 1.6 trillion total parameters, 49 billion active, 1 million token context. SWE-bench Verified: 80.6 percent against Claude's 80.8. LiveCodeBench: 93.5 against Claude's 88.8. Codeforces: 3206 against GPT-5.4 at 3168. Terminal-Bench 2.0: 67.9 against Claude's 65.4. Three coding evals where V4-Pro leads, one where it ties. The coding frontier just became a commodity.

In this episode of Sterling Intelligence we walk through what shipped, what the benchmarks actually mean, the pricing math against Claude Opus 4.6 and GPT-5.5, the Huawei Ascend hardware angle, and the three structural shifts this triggers across coding agents, US export controls, and the safety story for frontier AI.

Subscribe for AI news without the hype. New episodes every week. No filler.

⏱ Chapters
00:00 Cold open: an open model just tied Claude on coding
00:38 V4-Pro and V4-Flash specs, MIT License, 1M context
01:32 Why SWE-bench Verified is the benchmark that matters
02:24 Pricing: V4-Pro vs Claude Opus 4.6 vs GPT-5.5
03:18 Coding sweep: LiveCodeBench, Codeforces, Terminal-Bench 2.0
04:25 Where the closed labs still win: HLE, SimpleQA, HMMT
05:10 The training: 33T tokens, FP8/FP4, Huawei Ascend
06:05 Market reaction: SMIC up 10%, MiniMax down 9%
06:48 Kratsios vs Beijing: industrial scale copying or smear
07:25 Implication 1: coding agent margins collapse
08:05 Implication 2: export controls just got harder
08:45 Implication 3: the safety story for frontier AI
09:20 Caveats and the median coding task verdict
09:48 Sign off

🔗 Sources in the article companion at sterling.ai/faces/deepseek-v4-pro

#DeepSeek #V4Pro #OpenSource #ClaudeOpus #SWEBench #AICoding #MITLicense #Huawei #ExportControls #FrontierAI #ChinaAI #CodingAgents #Cursor #Anthropic #OpenAI
=== TITLES_HTML ===
<li>
  <div class="title-label">Top Pick</div>
  <div class="title-text">DeepSeek V4-Pro Just Tied Claude Opus On SWE-Bench<span class="title-char">50 chars</span></div>
  <div class="title-reason">Names the model, names the eval, states the result. No hype. Built for SEO + click discipline.</div>
</li>
<li>
  <div class="title-label">Alternate 1</div>
  <div class="title-text">An Open Source AI Just Caught Claude Opus At Coding<span class="title-char">51 chars</span></div>
  <div class="title-reason">Front loads the surprise (open source). Slightly higher CTR risk if viewer does not know what Claude Opus is.</div>
</li>
<li>
  <div class="title-label">Alternate 2</div>
  <div class="title-text">DeepSeek V4-Pro: 80.6% On SWE-Bench, MIT License<span class="title-char">48 chars</span></div>
  <div class="title-reason">For the technical audience. Trades viral reach for credibility. Best for the developer cohort.</div>
</li>
=== KEYWORDS ===
DeepSeek V4-Pro, DeepSeek, V4-Pro, V4-Flash, open source AI, MIT License, SWE-bench, SWE-bench Verified, Claude Opus 4.6, Anthropic, GPT-5.5, GPT-5.4, Gemini 3.1 Pro, LiveCodeBench, Codeforces, Terminal-Bench, Hugging Face, Huawei Ascend, SMIC, China AI, AI coding, coding agents, Cursor, Replit, GitHub Copilot, Claude Code, frontier AI, AI benchmarks, MoE, mixture of experts, AI pricing
=== THUMBNAIL_HTML ===
<div class="thumb-spec">
  <p><strong>Expression.</strong> Quietly alarmed, slight forward lean, mouth closed. Like she just saw the numbers and is letting you in on the implication.</p>
  <p><strong>Head position.</strong> Squared to camera, chin level, slight tilt left of frame to leave headline real estate on the right.</p>
  <p><strong>Wardrobe.</strong> Standard dark blazer, charcoal collared shirt, no jewelry that catches light.</p>
  <p><strong>Eye direction.</strong> Direct to camera. No alternate take.</p>
  <p><strong>Lighting.</strong> Key light upper-left at ~4800K. Subtle teal-green rim light from behind-right at 30% to motif the DeepSeek brand without naming it.</p>
  <p><strong>Scene setup.</strong> Near-black charcoal studio. Faint code-glyph pattern on the right shoulder side at 10% opacity. Single accent: a small SWE-bench-style chart silhouette in the upper-right at 15% opacity.</p>
</div>
<div class="thumb-option">
  <div class="thumb-rank">Best</div>
  <div class="thumb-text">80.6% — OPEN WEIGHTS</div>
  <p><strong>Position.</strong> Right two thirds of frame, two stacked lines.</p>
  <p><strong>Font.</strong> Heavy condensed sans, all caps, "80.6%" in oversized weight.</p>
  <p><strong>Color scheme.</strong> "80.6%" in #ffffff with a 6px charcoal stroke. "OPEN WEIGHTS" in DeepSeek teal #0c8d72. Subtle gold underline beneath.</p>
  <p><strong>Accent detail.</strong> Tiny "SWE-Bench" tag in 28pt under the percentage in #c8c8c8.</p>
</div>
<div class="thumb-option">
  <div class="thumb-rank">Alternate 1</div>
  <div class="thumb-text">CLAUDE TIED</div>
  <p><strong>Position.</strong> Centered top third over Jane's shoulder.</p>
  <p><strong>Font.</strong> Heavy display serif, all caps, single line.</p>
  <p><strong>Color scheme.</strong> "CLAUDE" in Anthropic-orange #d97757, "TIED" in white with a thin red strikethrough.</p>
  <p><strong>Accent detail.</strong> Small "by an open model" subline in 22pt italic, #c8c8c8.</p>
</div>
<div class="thumb-option">
  <div class="thumb-rank">Alternate 2</div>
  <div class="thumb-text">$3.48 vs $25</div>
  <p><strong>Position.</strong> Right two thirds, large numerals stacked over a divider.</p>
  <p><strong>Font.</strong> Mono numerals for credibility, sans subline.</p>
  <p><strong>Color scheme.</strong> "$3.48" in DeepSeek teal #0c8d72, "$25" in muted Anthropic-orange #d97757 with diagonal strike.</p>
  <p><strong>Accent detail.</strong> Small subline "per million output tokens" in 22pt #c8c8c8.</p>
</div>
=== HEYGEN_LOOK ===
A photorealistic headshot photo of a poised woman in her early 30s with a quietly alarmed expression, dark blazer, charcoal collared shirt, minimalist styling, no jewelry that catches light, head squared to camera with a slight forward lean. Background: a near-black charcoal modern studio with a faint teal-green gradient (#0c8d72 at 12% opacity) in the upper-right corner and a ghosted code-glyph pattern at 10% opacity along her right shoulder line. Single accent: a soft SWE-bench-style chart silhouette in the upper-right at 15% opacity. Key light from upper-left at ~4800K, soft fill on the right at 25% intensity, subtle teal rim light from behind-right at 30% intensity. Direct eye contact with the camera. Headshot framing, ultrarealistic, sharp focus, clean rendering, artifact-free, shallow depth of field with subject tack-sharp and background soft. Cinematic, restrained, authoritative.
=== MOTION_LOWER_THIRD ===
name: Jane Sterling
role: AI Models Reporter
org: Sterling Intelligence
=== MOTION_OUTRO ===
eyebrow: If this hit different —
main: Subscribe.
sub: New episodes every week. No filler.
platform1: YouTube
handle1: @SterlingIntel
platform2: X / Twitter
handle2: @SterlingIntel
platform3: Newsletter
handle3: sterling.ai
=== MOTION_STAT_1 ===
category: V4-Pro Architecture
value: 1.6
unit: T
desc1: Total parameters · 49B active per token
desc2: DeepSeek V4-Pro · April 24, 2026
badge: ▲ MIT License · 1M context
=== MOTION_COMPARISON_1 ===
benchmark: SWE-bench Verified
model_a: Claude Opus 4.6
score_a: 80.8
model_b: DeepSeek V4-Pro
score_b: 80.6
unit: %
source: Source: DeepSeek tech report · April 24, 2026
=== MOTION_COMPARISON_2 ===
benchmark: Output Pricing (per M tokens)
model_a: Claude Opus 4.6
score_a: 25.00
model_b: DeepSeek V4-Pro
score_b: 3.48
unit: $
source: Source: Anthropic + DeepSeek API pricing · April 2026
=== MOTION_MULTI_1 ===
title: Coding Sweep · V4-Pro Leads
val1: 93.5%
lbl1: LiveCodeBench
val2: 3206
lbl2: Codeforces
val3: 67.9%
lbl3: Terminal-Bench 2.0
=== MOTION_STAT_2 ===
category: Codeforces Rating
value: 3206
unit:
desc1: Beats GPT-5.4 (3168) and Gemini 3.1 Pro (3052)
desc2: DeepSeek V4-Pro · April 2026
badge: ▲ #1 in published competitive coding rating
=== MOTION_STAT_3 ===
category: Training Tokens
value: 33
unit: T
desc1: Mixed FP8 / FP4 precision · 9.5–13.7x less memory than V3.2
desc2: DeepSeek V4-Pro tech report · April 24, 2026
badge: ▼ Memory cost vs prior generation
=== MOTION_GROWTH_1 ===
label: SMIC Share Price · Hong Kong Open
arrow: ↑
sign: +
value: 10
metric: Chinese chipmaker tied to Huawei Ascend
period: April 24, 2026 (intraday)
=== MOTION_GROWTH_2 ===
label: Closed Chinese Model Competitors
arrow: ↓
sign: −
value: 9
metric: MiniMax and Knowledge Atlas (combined index)
period: April 24, 2026 (intraday)
=== MOTION_RANK_1 ===
context: Open Weights Coding Frontier
rank: 1
category: SWE-bench Verified · LiveCodeBench · Codeforces · Terminal-Bench
source: DeepSeek V4-Pro · April 2026
=== SOURCES_HTML ===
<div class="source-group">
  <h3>Official</h3>
  <ul class="source-list">
    <li>
      <a href="https://api-docs.deepseek.com/news/news260424" target="_blank" rel="noopener">DeepSeek V4-Pro and V4-Flash Release Announcement</a>
      <span class="src-meta">DeepSeek API Docs &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" target="_blank" rel="noopener">DeepSeek-V4-Pro Model Card</a>
      <span class="src-meta">Hugging Face &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="https://api-docs.deepseek.com/quick_start/pricing" target="_blank" rel="noopener">DeepSeek API Pricing</a>
      <span class="src-meta">DeepSeek API Docs &nbsp;·&nbsp; April 2026</span>
    </li>
  </ul>
</div>
<div class="source-group">
  <h3>Media</h3>
  <ul class="source-list">
    <li>
      <a href="https://www.cnbc.com/2026/04/24/deepseek-v4-llm-preview-open-source-ai-competition-china.html" target="_blank" rel="noopener">China's DeepSeek Releases Preview of Long-Awaited V4 Model as AI Race Intensifies</a>
      <span class="src-meta">CNBC &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="https://fortune.com/2026/04/24/deepseek-v4-ai-model-price-performance-china-open-source/" target="_blank" rel="noopener">DeepSeek Unveils V4 Model With Rock-Bottom Prices and Close Integration With Huawei's Chips</a>
      <span class="src-meta">Fortune &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="https://www.theregister.com/2026/04/24/deepseek_v4/" target="_blank" rel="noopener">DeepSeek's New Models Offer Big Inference Cost Savings</a>
      <span class="src-meta">The Register &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5" target="_blank" rel="noopener">DeepSeek V4 Arrives With Near State-of-the-Art Intelligence at 1/6th the Cost of Opus 4.7 / GPT-5.5</a>
      <span class="src-meta">VentureBeat &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="https://analyticsindiamag.com/ai-news/deepseek-releases-v4-pro-challenging-openai-anthropic-on-key-benchmarks" target="_blank" rel="noopener">DeepSeek Releases V4-Pro, Challenging OpenAI, Anthropic on Key Benchmarks</a>
      <span class="src-meta">Analytics India Magazine &nbsp;·&nbsp; April 24, 2026</span>
    </li>
  </ul>
</div>
<div class="source-group">
  <h3>Analyst &amp; Independent</h3>
  <ul class="source-list">
    <li>
      <a href="https://www.buildfastwithai.com/blogs/deepseek-v4-pro-review-2026" target="_blank" rel="noopener">DeepSeek V4-Pro Review: Benchmarks, Pricing &amp; Architecture</a>
      <span class="src-meta">Build Fast With AI &nbsp;·&nbsp; April 25, 2026</span>
    </li>
    <li>
      <a href="https://www.geeky-gadgets.com/open-source-deepseek-v4-limitations/" target="_blank" rel="noopener">DeepSeek V4 Performance Analysis: Does It Beat Kimi K2.6 and Qwen 3.6 Plus?</a>
      <span class="src-meta">Geeky Gadgets &nbsp;·&nbsp; April 25, 2026</span>
    </li>
    <li>
      <a href="https://letsdatascience.com/news/deepseek-launches-v4-preview-with-pro-and-flash-343285ec" target="_blank" rel="noopener">DeepSeek Launches V4 Preview With Pro and Flash</a>
      <span class="src-meta">Let's Data Science &nbsp;·&nbsp; April 25, 2026</span>
    </li>
    <li>
      <a href="https://startupfortune.com/deepseek-v4-flash-is-so-cheap-it-should-embarrass-every-western-ai-lab-with-a-pricing-page/" target="_blank" rel="noopener">DeepSeek V4-Flash Is So Cheap It Should Embarrass Every Western AI Lab</a>
      <span class="src-meta">Startup Fortune &nbsp;·&nbsp; April 25, 2026</span>
    </li>
    <li>
      <a href="https://en.cryptonomist.ch/2026/04/24/deepseek-v4-open-source-ai/" target="_blank" rel="noopener">DeepSeek V4 Pushes Open-Source AI Into a One-Million-Token Race</a>
      <span class="src-meta">The Cryptonomist &nbsp;·&nbsp; April 24, 2026</span>
    </li>
  </ul>
</div>
<div class="source-group">
  <h3>Prior Context</h3>
  <ul class="source-list">
    <li>
      <a href="/sandbox/faces/deepseek-v4" target="_blank" rel="noopener">DeepSeek V4 Is Here — And It's 50x Cheaper Than GPT-5.5</a>
      <span class="src-meta">Sterling Intelligence &nbsp;·&nbsp; April 24, 2026</span>
    </li>
    <li>
      <a href="/sandbox/faces/china-distillation" target="_blank" rel="noopener">White House Names China's AI Distillation a National Security Threat</a>
      <span class="src-meta">Sterling Intelligence &nbsp;·&nbsp; April 25, 2026</span>
    </li>
    <li>
      <a href="/sandbox/faces/qwen36" target="_blank" rel="noopener">Alibaba Just Dropped a Free AI That Rivals Claude at Coding</a>
      <span class="src-meta">Sterling Intelligence &nbsp;·&nbsp; April 24, 2026</span>
    </li>
  </ul>
</div>