Video Script (9 min, clean transcript for captioning)
One day after OpenAI released GPT-5.5, DeepSeek struck back.
On April 24, 2026, the Chinese AI lab announced DeepSeek V4 Preview, releasing two new models simultaneously under the MIT open-source license: V4-Pro and V4-Flash. Free to use, modify, and deploy commercially. The timing was not accidental. DeepSeek has made a pattern of landing its biggest releases at the precise moment the American AI industry is focused on its own headlines, and V4 continues that pattern with precision.
Twelve months ago, DeepSeek V3's surprise arrival shook the global AI industry and triggered a sharp selloff in American AI and semiconductor stocks. That episode forced a complete reassessment of how much compute was actually necessary to train competitive models, and it put every major American AI lab on notice that Chinese AI was no longer a generation behind. The question in April 2025 was whether DeepSeek's V3 performance was a one-time anomaly or a repeatable pattern. V4 answers that question.
V4-Pro is a massive mixture-of-experts model with 1.6 trillion total parameters. On any single inference pass, it activates 49 billion of those parameters. V4-Flash is the lighter sibling: 284 billion total parameters, 13 billion active per inference pass. Both models ship with a 1 million token context window by default. The predecessor, V3.2, had a 128K context window. DeepSeek jumped that ceiling by nearly eight times, handing the open-source community a context length that matches what the closed-source frontier labs charge premium prices to access.
A 1 million token context window is not just a bigger number. It is a different category of capability. You can feed an entire large software repository into a single prompt. You can load a year of medical records, a full legal discovery set, or months of research documents without chunking, without retrieval pipelines, without stitching responses together. Engineers have spent years building workarounds for short context limits. V4 makes most of those workarounds obsolete, and it ships as open source, for free.
DeepSeek published the model weights on Hugging Face. V4-Pro downloads at 865 gigabytes. V4-Flash comes in at 160 gigabytes. Both are available for local and self-hosted deployment, meaning any organization can run these models inside their own infrastructure without sending data to DeepSeek's servers. Multiple countries have already banned or restricted DeepSeek's cloud services over data privacy and national security concerns. The open weights sidestep those restrictions entirely.
The benchmark numbers are where V4-Pro gets genuinely difficult to dismiss. Start with software engineering. On SWE-bench Verified, the leading benchmark for real GitHub issue resolution, V4-Pro scored 80.6 percent. Claude Opus 4.6 scored 80.8 percent. That is a gap of 0.2 percentage points between a free-to-download open-source model and one of the most capable closed-source models in the world. Anyone looking at that leaderboard who expected a larger gap got a rude surprise.
On competitive programming, V4-Pro did not just approach the American frontier. It passed it. The model achieved a Codeforces competitive programming rating of 3,206. GPT-5.4 scores 3,168 on the same leaderboard. That makes DeepSeek V4-Pro the highest-rated open-source model in competitive programming history at the time of release, and it outscored the OpenAI equivalent at that specific task. A Codeforces rating in that range places the model among elite competitive programmers, capable of solving algorithmic problems that most professional engineers cannot crack.
The science and mathematics results follow the same pattern. V4-Pro scored 90.1 percent on GPQA Diamond, which tests PhD-level science reasoning across biology, chemistry, and physics. It scored 87.5 percent on MMLU-Pro, the advanced multidomain knowledge benchmark. On the HMMT 2026 February math competition benchmark, it scored 95.2 percent. In certain technical domains, V4-Pro is not near-frontier. It IS the frontier.
Long-context performance holds up under structured testing. On MRCR 1M, a retrieval benchmark designed to test whether a model can find and correctly use specific information buried anywhere inside a 1 million token context, V4-Pro scored 83.5 percent. That is notable because many models that advertise large context windows perform poorly on retrieval tasks at the extreme of their stated limit, delivering a headline number without reliable performance. The context window is not just a marketing figure.
On Terminal-Bench 2.0, a benchmark for autonomous coding agents that must navigate a real terminal environment, V4-Pro scored 67.9 percent. GPT-5.5 scores 82.7 percent on the same benchmark. That gap matters for anyone building fully agentic systems. On SWE-bench Pro, a harder and more recent coding evaluation, V4-Pro scored 55.4 percent, suggesting that headline SWE-bench number does not capture performance on more complex tasks. And on broad world-knowledge tasks, MIT Technology Review reports that V4-Pro still trails Gemini 3.1 Pro and GPT-5.4 by an estimated 3 to 6 months. The picture is mixed, but V4-Pro excels where developers spend most of their compute budget.
Against other open-source competitors, the story is less mixed. Alibaba's Qwen and Z.ai's GLM, the two primary open-weight rivals, are both outperformed by V4-Pro on coding and mathematics benchmarks. DeepSeek has consolidated its position as the dominant open-weight AI provider globally, and V4 extends that lead substantially. For any team currently benchmarking Qwen or GLM for production workloads, V4-Flash's pricing and context window make it a direct replacement worth evaluating.
The architecture behind the performance is a genuine engineering achievement. DeepSeek built what they call Hybrid Attention, combining Compressed Sparse Attention and Heavily Compressed Attention mechanisms to handle extremely long sequences without the normal exponential growth in memory and compute cost. At 1 million token context length, V4-Pro requires only 27 percent of the inference FLOPs that V3.2 needed for the same context. The KV cache, the memory structure used to track what the model has processed, drops to just 10 percent of V3.2's requirement. The model got more capable at long context AND more efficient to run at long context simultaneously. V4-Pro was trained on more than 32 trillion tokens, among the largest training datasets for any publicly released model.
The pricing makes that efficiency impossible to ignore. V4-Pro costs $1.74 per million input tokens and $3.48 per million output tokens. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. For tasks where V4-Pro matches frontier quality, it runs at roughly one-sixth the cost of frontier competitors. For teams running millions of coding or reasoning queries per month, that is the difference between a workload being economically viable or not. V4-Flash takes that logic further: $0.14 per million input tokens and $0.28 per million output tokens, making it one of the cheapest capable AI APIs available anywhere in the world.
Markets responded immediately. On V4 launch day, shares of SMIC, the Chinese chipmaker that manufactures Huawei's Ascend AI processors, jumped 10 percent in Hong Kong trading. Chinese AI competitors MiniMax and Knowledge Atlas fell over 9 percent. Investors read the same story the benchmarks told: DeepSeek is pulling away from its domestic rivals, and the Huawei chip angle signals something larger than a routine product update.
V4 is DeepSeek's first release with explicit optimization for Huawei Ascend processors. The United States has spent years enforcing export controls designed to block China's access to advanced AI training hardware, particularly Nvidia's high-end GPUs. The strategy rests on a single premise: limit chip access, limit frontier AI development. V4's Huawei optimization puts direct pressure on that premise.
Nvidia CEO Jensen Huang has said plainly what is at stake. His words: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for the U.S." V4 is not fully there yet. Liu Zhiyuan of Tsinghua University told MIT Technology Review that the majority of V4 training likely still depended on Nvidia hardware, suggesting partial rather than full chip independence. But the Huawei optimization is a step toward the scenario Huang described, not away from it. The export controls may still be buying time. It is becoming less clear how much time.
The IP dispute surrounding DeepSeek has intensified alongside V4. White House Office of Science and Technology Policy Director Michael Kratsios stated: "There is nothing innovative about systematically extracting and copying the innovations of American industry." Anthropic traced 24,000 fake accounts and 16 million unauthorized Claude API queries to Chinese AI firms, including DeepSeek, MiniMax, and Moonshot AI, as part of apparent model distillation operations. DeepSeek denies wrongdoing. The Frontier Model Forum, representing OpenAI, Anthropic, and Google, held a joint announcement on April 6 and 7, 2026, pledging to share intelligence to counter these distillation operations. DeepSeek released V4 eighteen days after that announcement.
DeepSeek has not provided a timeline for moving V4 out of Preview status. The API documentation notes that legacy deepseek-chat and deepseek-reasoner models will be fully retired on July 24, 2026, giving V4 roughly three months to prove stable in production before it becomes the only option on DeepSeek's platform.
The AI race we are living through has stopped looking like a pure technology competition and started looking like something harder to define. There are export controls, intelligence coalitions, IP lawsuits, Huawei chips, and market swings wrapped around every significant model release. DeepSeek V4 is the clearest proof yet that the coding and reasoning gap between American frontier AI and Chinese open-source AI has EFFECTIVELY CLOSED. The open-source world now has a model that matches frontier performance on the tasks developers pay the most to run, available to anyone on the planet, for a fraction of the price. Whatever happens next in this race, benchmarks will not decide it alone.
Annotated Script (with b-roll & cut cues)
One day after OpenAI released GPT-5.5, DeepSeek struck back.
[CUT] [VOICEOVER — scene 1] [B-ROLL: ai-abstract]On April 24, 2026, the Chinese AI lab announced DeepSeek V4 Preview, releasing two new models simultaneously under the MIT open-source license: V4-Pro and V4-Flash. Free to use, modify, and deploy commercially. The timing was not accidental. DeepSeek has made a pattern of landing its biggest releases at the precise moment the American AI industry is focused on its own headlines, and V4 continues that pattern with precision.
[B-ROLL: company-logo:DeepSeek]Twelve months ago, DeepSeek V3's surprise arrival shook the global AI industry and triggered a sharp selloff in American AI and semiconductor stocks. That episode forced a complete reassessment of how much compute was actually necessary to train competitive models, and it put every major American AI lab on notice that Chinese AI was no longer a generation behind. The question in April 2025 was whether DeepSeek's V3 performance was a one-time anomaly or a repeatable pattern. V4 answers that question.
[STAT CARD: "V3 debut: April 2025 shockwave"] [B-ROLL: data-center]V4-Pro is a massive mixture-of-experts model with 1.6 trillion total parameters. On any single inference pass, it activates 49 billion of those parameters. V4-Flash is the lighter sibling: 284 billion total parameters, 13 billion active per inference pass. Both models ship with a 1 million token context window by default. The predecessor, V3.2, had a 128K context window. DeepSeek jumped that ceiling by nearly eight times, handing the open-source community a context length that matches what the closed-source frontier labs charge premium prices to access.
[STAT CARD: "1600B total params in V4-Pro"] [STAT CARD: "V4-Pro: 49B params active per pass"] [STAT CARD: "V4-Flash: 284B params, 13B active"] [B-ROLL: code-terminal]A 1 million token context window is not just a bigger number. It is a different category of capability. You can feed an entire large software repository into a single prompt. You can load a year of medical records, a full legal discovery set, or months of research documents without chunking, without retrieval pipelines, without stitching responses together. Engineers have spent years building workarounds for short context limits. V4 makes most of those workarounds obsolete, and it ships as open source, for free.
[B-ROLL: data-center]DeepSeek published the model weights on Hugging Face. V4-Pro downloads at 865 gigabytes. V4-Flash comes in at 160 gigabytes. Both are available for local and self-hosted deployment, meaning any organization can run these models inside their own infrastructure without sending data to DeepSeek's servers. Multiple countries have already banned or restricted DeepSeek's cloud services over data privacy and national security concerns. The open weights sidestep those restrictions entirely.
[STAT CARD: "V4-Pro download: 865 GB model"] [STAT CARD: "V4-Flash download: 160 GB model"] [/VOICEOVER] [TALKING HEAD — transition]The benchmark numbers are where V4-Pro gets genuinely difficult to dismiss. Start with software engineering. On SWE-bench Verified, the leading benchmark for real GitHub issue resolution, V4-Pro scored 80.6 percent. Claude Opus 4.6 scored 80.8 percent. That is a gap of 0.2 percentage points between a free-to-download open-source model and one of the most capable closed-source models in the world. Anyone looking at that leaderboard who expected a larger gap got a rude surprise.
[STAT CARD: "SWE-bench: 80.6% vs 80.8% Claude"] [CUT] [VOICEOVER — scene 2] [B-ROLL: finance-charts]On competitive programming, V4-Pro did not just approach the American frontier. It passed it. The model achieved a Codeforces competitive programming rating of 3,206. GPT-5.4 scores 3,168 on the same leaderboard. That makes DeepSeek V4-Pro the highest-rated open-source model in competitive programming history at the time of release, and it outscored the OpenAI equivalent at that specific task. A Codeforces rating in that range places the model among elite competitive programmers, capable of solving algorithmic problems that most professional engineers cannot crack.
[STAT CARD: "Codeforces: 3206 — #1 open source"] [STAT CARD: "GPT-5.4 Codeforces rating: 3168"] [B-ROLL: stills:science-publications]The science and mathematics results follow the same pattern. V4-Pro scored 90.1 percent on GPQA Diamond, which tests PhD-level science reasoning across biology, chemistry, and physics. It scored 87.5 percent on MMLU-Pro, the advanced multidomain knowledge benchmark. On the HMMT 2026 February math competition benchmark, it scored 95.2 percent. In certain technical domains, V4-Pro is not near-frontier. It IS the frontier.
[STAT CARD: "GPQA Diamond: 90.1% PhD-level"] [STAT CARD: "MMLU-Pro: 87.5% multidomain"] [STAT CARD: "HMMT Feb 2026 math competition"] [STAT CARD: "HMMT math: 95.2% — frontier tier"] [B-ROLL: code-terminal]Long-context performance holds up under structured testing. On MRCR 1M, a retrieval benchmark designed to test whether a model can find and correctly use specific information buried anywhere inside a 1 million token context, V4-Pro scored 83.5 percent. That is notable because many models that advertise large context windows perform poorly on retrieval tasks at the extreme of their stated limit, delivering a headline number without reliable performance. The context window is not just a marketing figure.
[STAT CARD: "MRCR 1M: 83.5% retrieval score"] [B-ROLL: screen-capture:benchmark-chart]On Terminal-Bench 2.0, a benchmark for autonomous coding agents that must navigate a real terminal environment, V4-Pro scored 67.9 percent. GPT-5.5 scores 82.7 percent on the same benchmark. That gap matters for anyone building fully agentic systems. On SWE-bench Pro, a harder and more recent coding evaluation, V4-Pro scored 55.4 percent, suggesting that headline SWE-bench number does not capture performance on more complex tasks. And on broad world-knowledge tasks, MIT Technology Review reports that V4-Pro still trails Gemini 3.1 Pro and GPT-5.4 by an estimated 3 to 6 months. The picture is mixed, but V4-Pro excels where developers spend most of their compute budget.
[STAT CARD: "Terminal-Bench: 67.9% vs GPT 82.7%"] [STAT CARD: "SWE-bench Pro: 55.4% harder tasks"] [B-ROLL: company-logo:Alibaba]Against other open-source competitors, the story is less mixed. Alibaba's Qwen and Z.ai's GLM, the two primary open-weight rivals, are both outperformed by V4-Pro on coding and mathematics benchmarks. DeepSeek has consolidated its position as the dominant open-weight AI provider globally, and V4 extends that lead substantially. For any team currently benchmarking Qwen or GLM for production workloads, V4-Flash's pricing and context window make it a direct replacement worth evaluating.
[B-ROLL: data-center]The architecture behind the performance is a genuine engineering achievement. DeepSeek built what they call Hybrid Attention, combining Compressed Sparse Attention and Heavily Compressed Attention mechanisms to handle extremely long sequences without the normal exponential growth in memory and compute cost. At 1 million token context length, V4-Pro requires only 27 percent of the inference FLOPs that V3.2 needed for the same context. The KV cache, the memory structure used to track what the model has processed, drops to just 10 percent of V3.2's requirement. The model got more capable at long context AND more efficient to run at long context simultaneously. V4-Pro was trained on more than 32 trillion tokens, among the largest training datasets for any publicly released model.
[STAT CARD: "27% of V3.2 FLOPs at 1M ctx"] [STAT CARD: "KV cache: only 10% of V3.2 needs"] [STAT CARD: "Trained on 32T tokens of data"] [B-ROLL: finance-charts]The pricing makes that efficiency impossible to ignore. V4-Pro costs $1.74 per million input tokens and $3.48 per million output tokens. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. For tasks where V4-Pro matches frontier quality, it runs at roughly one-sixth the cost of frontier competitors. For teams running millions of coding or reasoning queries per month, that is the difference between a workload being economically viable or not. V4-Flash takes that logic further: $0.14 per million input tokens and $0.28 per million output tokens, making it one of the cheapest capable AI APIs available anywhere in the world.
[STAT CARD: "V4-Pro: $1.74/M in vs $5 GPT-5"] [STAT CARD: "V4-Pro output: $3.48/M tokens"] [STAT CARD: "GPT-5.5 output: $30/M per token"] [STAT CARD: "Claude Opus output: $25/M tokens"] [STAT CARD: "V4-Flash: $0.14/M input tokens"] [STAT CARD: "V4-Flash output: $0.28/M tokens"] [/VOICEOVER] [TALKING HEAD — transition]Markets responded immediately. On V4 launch day, shares of SMIC, the Chinese chipmaker that manufactures Huawei's Ascend AI processors, jumped 10 percent in Hong Kong trading. Chinese AI competitors MiniMax and Knowledge Atlas fell over 9 percent. Investors read the same story the benchmarks told: DeepSeek is pulling away from its domestic rivals, and the Huawei chip angle signals something larger than a routine product update.
[CUT] [VOICEOVER — scene 3] [B-ROLL: military]V4 is DeepSeek's first release with explicit optimization for Huawei Ascend processors. The United States has spent years enforcing export controls designed to block China's access to advanced AI training hardware, particularly Nvidia's high-end GPUs. The strategy rests on a single premise: limit chip access, limit frontier AI development. V4's Huawei optimization puts direct pressure on that premise.
[B-ROLL: news-studio]Nvidia CEO Jensen Huang has said plainly what is at stake. His words: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for the U.S." V4 is not fully there yet. Liu Zhiyuan of Tsinghua University told MIT Technology Review that the majority of V4 training likely still depended on Nvidia hardware, suggesting partial rather than full chip independence. But the Huawei optimization is a step toward the scenario Huang described, not away from it. The export controls may still be buying time. It is becoming less clear how much time.
[B-ROLL: courtroom]The IP dispute surrounding DeepSeek has intensified alongside V4. White House Office of Science and Technology Policy Director Michael Kratsios stated: "There is nothing innovative about systematically extracting and copying the innovations of American industry." Anthropic traced 24,000 fake accounts and 16 million unauthorized Claude API queries to Chinese AI firms, including DeepSeek, MiniMax, and Moonshot AI, as part of apparent model distillation operations. DeepSeek denies wrongdoing. The Frontier Model Forum, representing OpenAI, Anthropic, and Google, held a joint announcement on April 6 and 7, 2026, pledging to share intelligence to counter these distillation operations. DeepSeek released V4 eighteen days after that announcement.
[STAT CARD: "24000 fake accounts traced"] [STAT CARD: "16M stolen Claude API queries"] [B-ROLL: code-terminal]DeepSeek has not provided a timeline for moving V4 out of Preview status. The API documentation notes that legacy deepseek-chat and deepseek-reasoner models will be fully retired on July 24, 2026, giving V4 roughly three months to prove stable in production before it becomes the only option on DeepSeek's platform.
[/VOICEOVER] [TALKING HEAD — sign-off]The AI race we are living through has stopped looking like a pure technology competition and started looking like something harder to define. There are export controls, intelligence coalitions, IP lawsuits, Huawei chips, and market swings wrapped around every significant model release. DeepSeek V4 is the clearest proof yet that the coding and reasoning gap between American frontier AI and Chinese open-source AI has EFFECTIVELY CLOSED. The open-source world now has a model that matches frontier performance on the tasks developers pay the most to run, available to anyone on the planet, for a fraction of the price. Whatever happens next in this race, benchmarks will not decide it alone.
YouTube Description
Titles
-
Top PickDeepSeek V4 Preview Matches Frontier Code at One-Sixth the Price64 charsFactual, search-optimized, leads with the performance-price contrast that drives the story.
-
Alternate 1China's DeepSeek Just Outscored OpenAI on Coding — For Free59 charsDrama-forward: uses the Codeforces upset and the open-source angle to create urgency.
-
Alternate 2Why DeepSeek V4's 1M-Token Context Changes Enterprise AI56 charsAnalyst framing: targets developers and enterprises evaluating production context-window needs.
Keywords
Thumbnail Brief
Expression. Serious and measured, slight forward-lean — the look of someone presenting significant findings, not alarm.
Head position. Slight 3/4 turn, eye-line direct to camera, chin level.
Wardrobe. Dark navy blazer, no visible jewelry, clean minimal styling.
Eye direction. Direct to camera, sharp and alert, no blink-moment.
Lighting. Hard key light from upper-left at ~5000K, subtle cyan rim on right shoulder, minimal fill.
Scene setup. Near-black charcoal background, soft cyan gradient on left side, faint circuit-schematic motif at 8% opacity upper-right quadrant.
Position. Top-left, bold stacked sans-serif spanning two-thirds of frame width; sub-line "DeepSeek V4 Matches Frontier" below.
Font. Extra-bold condensed sans-serif, all-caps; 90px headline / 34px sub-line.
Color scheme. White (#FFFFFF) headline on dark background; cyan (#00CFCF) accent underline bar; sub-line in light gray (#CCCCCC).
Accent detail. Thin 3px cyan (#00CFCF) left-edge border on headline block; dark vignette in bottom-right quarter to push Jane forward.
Position. Top-center, oversized display fraction with sub-label below; sub-line "Open Source Matches GPT-5.5."
Font. Ultra-bold display sans-serif all-caps; "1/6TH" in cyan (#00CFCF) at 100px; "THE COST" in white (#FFFFFF) at 60px.
Color scheme. Cyan primary (#00CFCF) + white on charcoal; sub-line in mid-gray (#AAAAAA) at 28px.
Accent detail. Horizontal 2px cyan separator between fraction and sub-line; subtle drop shadow on text block for readability on varied backgrounds.
Position. Bottom-left, two-line stat callout block with small "SWE-bench Verified" eyebrow label above the numbers.
Font. Bold monospace-style numerals at 72px; "vs" separator in cyan (#00CFCF) at 48px; eyebrow in all-caps #888888 at 20px.
Color scheme. White (#FFFFFF) numbers on semi-transparent dark panel (#000000 at 65% opacity); cyan (#00CFCF) accent on separator.
Accent detail. Rounded-corner backing panel with 2px cyan border; attribution "V4-Pro vs Claude Opus 4.6" beneath numbers in #777777 at 18px.
HeyGen Avatar Look
Copy-paste into HeyGen → Generate Look. Pair with a hero screen-grab exported as img/<slug>-hero.jpg.
Sources & References
Official
Media
- DeepSeek previews new AI model that 'closes the gap' with frontier models
- Three reasons why DeepSeek's new model matters
- DeepSeek unveils its newest model at rock-bottom prices and with 'full support' from Huawei chips
- China's DeepSeek releases new AI model V4. Here's everything to know as the AI race speeds up