=== TAG === AI Models === HEADLINE === DeepSeek V4 Preview Matches Frontier Code at One-Sixth the Price === META_DESC === DeepSeek released V4 Preview on April 24, 2026, two MIT-licensed models with 1M-token context windows — V4-Pro scores 80.6% on SWE-bench, matching Claude Opus 4.6, at one-sixth the price of GPT-5.5. === DATE === April 24, 2026 === AUTHOR === Jane Sterling === READ_TIME === 9-minute read === HERO_IMG === img/content.png === SCRIPT_LABEL === Video Script (9 min, clean transcript for captioning) === SCRIPT === One day after OpenAI released GPT-5.5, DeepSeek struck back. On April 24, 2026, the Chinese AI lab announced DeepSeek V4 Preview, releasing two new models simultaneously under the MIT open-source license: V4-Pro and V4-Flash. Free to use, modify, and deploy commercially. The timing was not accidental. DeepSeek has made a pattern of landing its biggest releases at the precise moment the American AI industry is focused on its own headlines, and V4 continues that pattern with precision. Twelve months ago, DeepSeek V3's surprise arrival shook the global AI industry and triggered a sharp selloff in American AI and semiconductor stocks. That episode forced a complete reassessment of how much compute was actually necessary to train competitive models, and it put every major American AI lab on notice that Chinese AI was no longer a generation behind. The question in April 2025 was whether DeepSeek's V3 performance was a one-time anomaly or a repeatable pattern. V4 answers that question. V4-Pro is a massive mixture-of-experts model with 1.6 trillion total parameters. On any single inference pass, it activates 49 billion of those parameters. V4-Flash is the lighter sibling: 284 billion total parameters, 13 billion active per inference pass. Both models ship with a 1 million token context window by default. The predecessor, V3.2, had a 128K context window. DeepSeek jumped that ceiling by nearly eight times, handing the open-source community a context length that matches what the closed-source frontier labs charge premium prices to access. A 1 million token context window is not just a bigger number. It is a different category of capability. You can feed an entire large software repository into a single prompt. You can load a year of medical records, a full legal discovery set, or months of research documents without chunking, without retrieval pipelines, without stitching responses together. Engineers have spent years building workarounds for short context limits. V4 makes most of those workarounds obsolete, and it ships as open source, for free. DeepSeek published the model weights on Hugging Face. V4-Pro downloads at 865 gigabytes. V4-Flash comes in at 160 gigabytes. Both are available for local and self-hosted deployment, meaning any organization can run these models inside their own infrastructure without sending data to DeepSeek's servers. Multiple countries have already banned or restricted DeepSeek's cloud services over data privacy and national security concerns. The open weights sidestep those restrictions entirely. The benchmark numbers are where V4-Pro gets genuinely difficult to dismiss. Start with software engineering. On SWE-bench Verified, the leading benchmark for real GitHub issue resolution, V4-Pro scored 80.6 percent. Claude Opus 4.6 scored 80.8 percent. That is a gap of 0.2 percentage points between a free-to-download open-source model and one of the most capable closed-source models in the world. Anyone looking at that leaderboard who expected a larger gap got a rude surprise. On competitive programming, V4-Pro did not just approach the American frontier. It passed it. The model achieved a Codeforces competitive programming rating of 3,206. GPT-5.4 scores 3,168 on the same leaderboard. That makes DeepSeek V4-Pro the highest-rated open-source model in competitive programming history at the time of release, and it outscored the OpenAI equivalent at that specific task. A Codeforces rating in that range places the model among elite competitive programmers, capable of solving algorithmic problems that most professional engineers cannot crack. The science and mathematics results follow the same pattern. V4-Pro scored 90.1 percent on GPQA Diamond, which tests PhD-level science reasoning across biology, chemistry, and physics. It scored 87.5 percent on MMLU-Pro, the advanced multidomain knowledge benchmark. On the HMMT 2026 February math competition benchmark, it scored 95.2 percent. In certain technical domains, V4-Pro is not near-frontier. It IS the frontier. Long-context performance holds up under structured testing. On MRCR 1M, a retrieval benchmark designed to test whether a model can find and correctly use specific information buried anywhere inside a 1 million token context, V4-Pro scored 83.5 percent. That is notable because many models that advertise large context windows perform poorly on retrieval tasks at the extreme of their stated limit, delivering a headline number without reliable performance. The context window is not just a marketing figure. On Terminal-Bench 2.0, a benchmark for autonomous coding agents that must navigate a real terminal environment, V4-Pro scored 67.9 percent. GPT-5.5 scores 82.7 percent on the same benchmark. That gap matters for anyone building fully agentic systems. On SWE-bench Pro, a harder and more recent coding evaluation, V4-Pro scored 55.4 percent, suggesting that headline SWE-bench number does not capture performance on more complex tasks. And on broad world-knowledge tasks, MIT Technology Review reports that V4-Pro still trails Gemini 3.1 Pro and GPT-5.4 by an estimated 3 to 6 months. The picture is mixed, but V4-Pro excels where developers spend most of their compute budget. Against other open-source competitors, the story is less mixed. Alibaba's Qwen and Z.ai's GLM, the two primary open-weight rivals, are both outperformed by V4-Pro on coding and mathematics benchmarks. DeepSeek has consolidated its position as the dominant open-weight AI provider globally, and V4 extends that lead substantially. For any team currently benchmarking Qwen or GLM for production workloads, V4-Flash's pricing and context window make it a direct replacement worth evaluating. The architecture behind the performance is a genuine engineering achievement. DeepSeek built what they call Hybrid Attention, combining Compressed Sparse Attention and Heavily Compressed Attention mechanisms to handle extremely long sequences without the normal exponential growth in memory and compute cost. At 1 million token context length, V4-Pro requires only 27 percent of the inference FLOPs that V3.2 needed for the same context. The KV cache, the memory structure used to track what the model has processed, drops to just 10 percent of V3.2's requirement. The model got more capable at long context AND more efficient to run at long context simultaneously. V4-Pro was trained on more than 32 trillion tokens, among the largest training datasets for any publicly released model. The pricing makes that efficiency impossible to ignore. V4-Pro costs $1.74 per million input tokens and $3.48 per million output tokens. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. For tasks where V4-Pro matches frontier quality, it runs at roughly one-sixth the cost of frontier competitors. For teams running millions of coding or reasoning queries per month, that is the difference between a workload being economically viable or not. V4-Flash takes that logic further: $0.14 per million input tokens and $0.28 per million output tokens, making it one of the cheapest capable AI APIs available anywhere in the world. Markets responded immediately. On V4 launch day, shares of SMIC, the Chinese chipmaker that manufactures Huawei's Ascend AI processors, jumped 10 percent in Hong Kong trading. Chinese AI competitors MiniMax and Knowledge Atlas fell over 9 percent. Investors read the same story the benchmarks told: DeepSeek is pulling away from its domestic rivals, and the Huawei chip angle signals something larger than a routine product update. V4 is DeepSeek's first release with explicit optimization for Huawei Ascend processors. The United States has spent years enforcing export controls designed to block China's access to advanced AI training hardware, particularly Nvidia's high-end GPUs. The strategy rests on a single premise: limit chip access, limit frontier AI development. V4's Huawei optimization puts direct pressure on that premise. Nvidia CEO Jensen Huang has said plainly what is at stake. His words: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for the U.S." V4 is not fully there yet. Liu Zhiyuan of Tsinghua University told MIT Technology Review that the majority of V4 training likely still depended on Nvidia hardware, suggesting partial rather than full chip independence. But the Huawei optimization is a step toward the scenario Huang described, not away from it. The export controls may still be buying time. It is becoming less clear how much time. The IP dispute surrounding DeepSeek has intensified alongside V4. White House Office of Science and Technology Policy Director Michael Kratsios stated: "There is nothing innovative about systematically extracting and copying the innovations of American industry." Anthropic traced 24,000 fake accounts and 16 million unauthorized Claude API queries to Chinese AI firms, including DeepSeek, MiniMax, and Moonshot AI, as part of apparent model distillation operations. DeepSeek denies wrongdoing. The Frontier Model Forum, representing OpenAI, Anthropic, and Google, held a joint announcement on April 6 and 7, 2026, pledging to share intelligence to counter these distillation operations. DeepSeek released V4 eighteen days after that announcement. DeepSeek has not provided a timeline for moving V4 out of Preview status. The API documentation notes that legacy deepseek-chat and deepseek-reasoner models will be fully retired on July 24, 2026, giving V4 roughly three months to prove stable in production before it becomes the only option on DeepSeek's platform. The AI race we are living through has stopped looking like a pure technology competition and started looking like something harder to define. There are export controls, intelligence coalitions, IP lawsuits, Huawei chips, and market swings wrapped around every significant model release. DeepSeek V4 is the clearest proof yet that the coding and reasoning gap between American frontier AI and Chinese open-source AI has EFFECTIVELY CLOSED. The open-source world now has a model that matches frontier performance on the tasks developers pay the most to run, available to anyone on the planet, for a fraction of the price. Whatever happens next in this race, benchmarks will not decide it alone. === SCRIPT_HTML === === ANNOTATED_LABEL === Annotated Script (with b-roll & cut cues) === ANNOTATED_HTML === [TALKING HEAD — hook]
One day after OpenAI released GPT-5.5, DeepSeek struck back.
[CUT] [VOICEOVER — scene 1] [B-ROLL: ai-abstract]On April 24, 2026, the Chinese AI lab announced DeepSeek V4 Preview, releasing two new models simultaneously under the MIT open-source license: V4-Pro and V4-Flash. Free to use, modify, and deploy commercially. The timing was not accidental. DeepSeek has made a pattern of landing its biggest releases at the precise moment the American AI industry is focused on its own headlines, and V4 continues that pattern with precision.
[B-ROLL: company-logo:DeepSeek]Twelve months ago, DeepSeek V3's surprise arrival shook the global AI industry and triggered a sharp selloff in American AI and semiconductor stocks. That episode forced a complete reassessment of how much compute was actually necessary to train competitive models, and it put every major American AI lab on notice that Chinese AI was no longer a generation behind. The question in April 2025 was whether DeepSeek's V3 performance was a one-time anomaly or a repeatable pattern. V4 answers that question.
[STAT CARD: "V3 debut: April 2025 shockwave"] [B-ROLL: data-center]V4-Pro is a massive mixture-of-experts model with 1.6 trillion total parameters. On any single inference pass, it activates 49 billion of those parameters. V4-Flash is the lighter sibling: 284 billion total parameters, 13 billion active per inference pass. Both models ship with a 1 million token context window by default. The predecessor, V3.2, had a 128K context window. DeepSeek jumped that ceiling by nearly eight times, handing the open-source community a context length that matches what the closed-source frontier labs charge premium prices to access.
[STAT CARD: "1600B total params in V4-Pro"] [STAT CARD: "V4-Pro: 49B params active per pass"] [STAT CARD: "V4-Flash: 284B params, 13B active"] [B-ROLL: code-terminal]A 1 million token context window is not just a bigger number. It is a different category of capability. You can feed an entire large software repository into a single prompt. You can load a year of medical records, a full legal discovery set, or months of research documents without chunking, without retrieval pipelines, without stitching responses together. Engineers have spent years building workarounds for short context limits. V4 makes most of those workarounds obsolete, and it ships as open source, for free.
[B-ROLL: data-center]DeepSeek published the model weights on Hugging Face. V4-Pro downloads at 865 gigabytes. V4-Flash comes in at 160 gigabytes. Both are available for local and self-hosted deployment, meaning any organization can run these models inside their own infrastructure without sending data to DeepSeek's servers. Multiple countries have already banned or restricted DeepSeek's cloud services over data privacy and national security concerns. The open weights sidestep those restrictions entirely.
[STAT CARD: "V4-Pro download: 865 GB model"] [STAT CARD: "V4-Flash download: 160 GB model"] [/VOICEOVER] [TALKING HEAD — transition]The benchmark numbers are where V4-Pro gets genuinely difficult to dismiss. Start with software engineering. On SWE-bench Verified, the leading benchmark for real GitHub issue resolution, V4-Pro scored 80.6 percent. Claude Opus 4.6 scored 80.8 percent. That is a gap of 0.2 percentage points between a free-to-download open-source model and one of the most capable closed-source models in the world. Anyone looking at that leaderboard who expected a larger gap got a rude surprise.
[STAT CARD: "SWE-bench: 80.6% vs 80.8% Claude"] [CUT] [VOICEOVER — scene 2] [B-ROLL: finance-charts]On competitive programming, V4-Pro did not just approach the American frontier. It passed it. The model achieved a Codeforces competitive programming rating of 3,206. GPT-5.4 scores 3,168 on the same leaderboard. That makes DeepSeek V4-Pro the highest-rated open-source model in competitive programming history at the time of release, and it outscored the OpenAI equivalent at that specific task. A Codeforces rating in that range places the model among elite competitive programmers, capable of solving algorithmic problems that most professional engineers cannot crack.
[STAT CARD: "Codeforces: 3206 — #1 open source"] [STAT CARD: "GPT-5.4 Codeforces rating: 3168"] [B-ROLL: stills:science-publications]The science and mathematics results follow the same pattern. V4-Pro scored 90.1 percent on GPQA Diamond, which tests PhD-level science reasoning across biology, chemistry, and physics. It scored 87.5 percent on MMLU-Pro, the advanced multidomain knowledge benchmark. On the HMMT 2026 February math competition benchmark, it scored 95.2 percent. In certain technical domains, V4-Pro is not near-frontier. It IS the frontier.
[STAT CARD: "GPQA Diamond: 90.1% PhD-level"] [STAT CARD: "MMLU-Pro: 87.5% multidomain"] [STAT CARD: "HMMT Feb 2026 math competition"] [STAT CARD: "HMMT math: 95.2% — frontier tier"] [B-ROLL: code-terminal]Long-context performance holds up under structured testing. On MRCR 1M, a retrieval benchmark designed to test whether a model can find and correctly use specific information buried anywhere inside a 1 million token context, V4-Pro scored 83.5 percent. That is notable because many models that advertise large context windows perform poorly on retrieval tasks at the extreme of their stated limit, delivering a headline number without reliable performance. The context window is not just a marketing figure.
[STAT CARD: "MRCR 1M: 83.5% retrieval score"] [B-ROLL: screen-capture:benchmark-chart]On Terminal-Bench 2.0, a benchmark for autonomous coding agents that must navigate a real terminal environment, V4-Pro scored 67.9 percent. GPT-5.5 scores 82.7 percent on the same benchmark. That gap matters for anyone building fully agentic systems. On SWE-bench Pro, a harder and more recent coding evaluation, V4-Pro scored 55.4 percent, suggesting that headline SWE-bench number does not capture performance on more complex tasks. And on broad world-knowledge tasks, MIT Technology Review reports that V4-Pro still trails Gemini 3.1 Pro and GPT-5.4 by an estimated 3 to 6 months. The picture is mixed, but V4-Pro excels where developers spend most of their compute budget.
[STAT CARD: "Terminal-Bench: 67.9% vs GPT 82.7%"] [STAT CARD: "SWE-bench Pro: 55.4% harder tasks"] [B-ROLL: company-logo:Alibaba]Against other open-source competitors, the story is less mixed. Alibaba's Qwen and Z.ai's GLM, the two primary open-weight rivals, are both outperformed by V4-Pro on coding and mathematics benchmarks. DeepSeek has consolidated its position as the dominant open-weight AI provider globally, and V4 extends that lead substantially. For any team currently benchmarking Qwen or GLM for production workloads, V4-Flash's pricing and context window make it a direct replacement worth evaluating.
[B-ROLL: data-center]The architecture behind the performance is a genuine engineering achievement. DeepSeek built what they call Hybrid Attention, combining Compressed Sparse Attention and Heavily Compressed Attention mechanisms to handle extremely long sequences without the normal exponential growth in memory and compute cost. At 1 million token context length, V4-Pro requires only 27 percent of the inference FLOPs that V3.2 needed for the same context. The KV cache, the memory structure used to track what the model has processed, drops to just 10 percent of V3.2's requirement. The model got more capable at long context AND more efficient to run at long context simultaneously. V4-Pro was trained on more than 32 trillion tokens, among the largest training datasets for any publicly released model.
[STAT CARD: "27% of V3.2 FLOPs at 1M ctx"] [STAT CARD: "KV cache: only 10% of V3.2 needs"] [STAT CARD: "Trained on 32T tokens of data"] [B-ROLL: finance-charts]The pricing makes that efficiency impossible to ignore. V4-Pro costs $1.74 per million input tokens and $3.48 per million output tokens. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens. Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. For tasks where V4-Pro matches frontier quality, it runs at roughly one-sixth the cost of frontier competitors. For teams running millions of coding or reasoning queries per month, that is the difference between a workload being economically viable or not. V4-Flash takes that logic further: $0.14 per million input tokens and $0.28 per million output tokens, making it one of the cheapest capable AI APIs available anywhere in the world.
[STAT CARD: "V4-Pro: $1.74/M in vs $5 GPT-5"] [STAT CARD: "V4-Pro output: $3.48/M tokens"] [STAT CARD: "GPT-5.5 output: $30/M per token"] [STAT CARD: "Claude Opus output: $25/M tokens"] [STAT CARD: "V4-Flash: $0.14/M input tokens"] [STAT CARD: "V4-Flash output: $0.28/M tokens"] [/VOICEOVER] [TALKING HEAD — transition]Markets responded immediately. On V4 launch day, shares of SMIC, the Chinese chipmaker that manufactures Huawei's Ascend AI processors, jumped 10 percent in Hong Kong trading. Chinese AI competitors MiniMax and Knowledge Atlas fell over 9 percent. Investors read the same story the benchmarks told: DeepSeek is pulling away from its domestic rivals, and the Huawei chip angle signals something larger than a routine product update.
[CUT] [VOICEOVER — scene 3] [B-ROLL: military]V4 is DeepSeek's first release with explicit optimization for Huawei Ascend processors. The United States has spent years enforcing export controls designed to block China's access to advanced AI training hardware, particularly Nvidia's high-end GPUs. The strategy rests on a single premise: limit chip access, limit frontier AI development. V4's Huawei optimization puts direct pressure on that premise.
[B-ROLL: news-studio]Nvidia CEO Jensen Huang has said plainly what is at stake. His words: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for the U.S." V4 is not fully there yet. Liu Zhiyuan of Tsinghua University told MIT Technology Review that the majority of V4 training likely still depended on Nvidia hardware, suggesting partial rather than full chip independence. But the Huawei optimization is a step toward the scenario Huang described, not away from it. The export controls may still be buying time. It is becoming less clear how much time.
[B-ROLL: courtroom]The IP dispute surrounding DeepSeek has intensified alongside V4. White House Office of Science and Technology Policy Director Michael Kratsios stated: "There is nothing innovative about systematically extracting and copying the innovations of American industry." Anthropic traced 24,000 fake accounts and 16 million unauthorized Claude API queries to Chinese AI firms, including DeepSeek, MiniMax, and Moonshot AI, as part of apparent model distillation operations. DeepSeek denies wrongdoing. The Frontier Model Forum, representing OpenAI, Anthropic, and Google, held a joint announcement on April 6 and 7, 2026, pledging to share intelligence to counter these distillation operations. DeepSeek released V4 eighteen days after that announcement.
[STAT CARD: "24000 fake accounts traced"] [STAT CARD: "16M stolen Claude API queries"] [B-ROLL: code-terminal]DeepSeek has not provided a timeline for moving V4 out of Preview status. The API documentation notes that legacy deepseek-chat and deepseek-reasoner models will be fully retired on July 24, 2026, giving V4 roughly three months to prove stable in production before it becomes the only option on DeepSeek's platform.
[/VOICEOVER] [TALKING HEAD — sign-off]The AI race we are living through has stopped looking like a pure technology competition and started looking like something harder to define. There are export controls, intelligence coalitions, IP lawsuits, Huawei chips, and market swings wrapped around every significant model release. DeepSeek V4 is the clearest proof yet that the coding and reasoning gap between American frontier AI and Chinese open-source AI has EFFECTIVELY CLOSED. The open-source world now has a model that matches frontier performance on the tasks developers pay the most to run, available to anyone on the planet, for a fraction of the price. Whatever happens next in this race, benchmarks will not decide it alone.
=== ARTICLE_HTML === === YOUTUBE_DESC === DeepSeek V4 dropped one day after GPT-5.5 — and it's open source, free to download, with a 1 million token context window that matches Claude Opus on coding benchmarks at one-sixth the price. Sterling Intelligence covers the AI moves that matter. Subscribe for weekly in-depth analysis. On April 24, 2026, DeepSeek released V4 Preview: two MIT-licensed models, V4-Pro and V4-Flash, available for commercial use and self-hosting. The timing was deliberate — one day after OpenAI dropped GPT-5.5, and 18 days after the Frontier Model Forum pledged a joint intelligence-sharing operation against Chinese AI distillation. DeepSeek noticed, and responded. V4-Pro is a 1.6-trillion-parameter mixture-of-experts model activating 49 billion parameters per inference pass. Its context window jumped from 128K tokens to 1 million — matching what closed-source labs charge premium rates to provide. At that context length, V4-Pro runs on just 27% of the inference compute V3.2 required, and uses only 10% of the KV cache. That is not just scale. It is a different architectural approach to long-context efficiency, built on Hybrid Attention combining Compressed Sparse Attention and Heavily Compressed Attention. The model was trained on over 32 trillion tokens. The benchmark results demand attention. V4-Pro scored 80.6% on SWE-bench Verified — 0.2 points below Claude Opus 4.6's 80.8%. On Codeforces competitive programming, V4-Pro rated 3,206, surpassing GPT-5.4's 3,168 — the highest-rated open-source model in competitive programming history. On GPQA Diamond (PhD-level science), 90.1%. On HMMT 2026 math, 95.2%. On those specific tasks, V4-Pro does not approach the frontier. It IS the frontier. The gaps appear on autonomous agent tasks — V4-Pro scores 67.9% on Terminal-Bench 2.0 versus GPT-5.5's 82.7% — and on broad world-knowledge benchmarks, where MIT Technology Review estimates a 3–6 month lag behind Gemini 3.1 Pro. The pricing changes the calculation for every team running production AI workloads. V4-Pro costs $1.74 per million input tokens and $3.48 output. GPT-5.5 costs $5 input and $30 output. Claude Opus 4.7 is $5 input and $25 output. For equivalent coding or reasoning tasks, V4-Pro costs roughly one-sixth as much as frontier competitors. V4-Flash goes further: $0.14/$0.28 per million tokens, one of the cheapest capable APIs anywhere in the world. Both models are available as open weights — 865 GB for V4-Pro, 160 GB for V4-Flash — enabling self-hosted deployment that bypasses DeepSeek's servers entirely and sidesteps the data-privacy bans multiple countries have imposed on DeepSeek's cloud services. Geopolitics runs through the whole story. V4 is DeepSeek's first release with explicit Huawei Ascend chip optimization. US export controls were designed to cap China's AI training capability by limiting access to Nvidia's high-end GPUs. Nvidia CEO Jensen Huang said plainly: "The day that DeepSeek comes out on Huawei first, that is a horrible outcome for the U.S." V4 is not fully there yet — MIT Technology Review reports training still mostly depends on Nvidia hardware — but it is a step toward that outcome, not away from it. SMIC shares jumped 10% on V4 launch day; Chinese AI rivals MiniMax and Knowledge Atlas fell over 9%. ⏱ Chapters: 00:00 - Hook: DeepSeek Strikes Back 00:50 - V3 Disruption and What V4 Answers 02:00 - Architecture: Hybrid Attention and 1M Context 04:00 - Benchmark Breakdown: Code, Science, Math 06:15 - Pricing vs. GPT-5.5 and Claude Opus 07:15 - Markets, Huawei Chips, and Export Controls 08:15 - IP Dispute, Frontier Model Forum, and Sign-off #AI #DeepSeek #DeepSeekV4 #OpenSource #ArtificialIntelligence #ChatGPT #GPT55 #ClaudeOpus #AINews #MachineLearning #LLM #AIModels #ChineseAI #HuaweiAI #OpenAI #Anthropic #AIBenchmarks #SWEbench #Codeforces #AIRace === TITLES_HTML ===Expression. Serious and measured, slight forward-lean — the look of someone presenting significant findings, not alarm.
Head position. Slight 3/4 turn, eye-line direct to camera, chin level.
Wardrobe. Dark navy blazer, no visible jewelry, clean minimal styling.
Eye direction. Direct to camera, sharp and alert, no blink-moment.
Lighting. Hard key light from upper-left at ~5000K, subtle cyan rim on right shoulder, minimal fill.
Scene setup. Near-black charcoal background, soft cyan gradient on left side, faint circuit-schematic motif at 8% opacity upper-right quadrant.
Position. Top-left, bold stacked sans-serif spanning two-thirds of frame width; sub-line "DeepSeek V4 Matches Frontier" below.
Font. Extra-bold condensed sans-serif, all-caps; 90px headline / 34px sub-line.
Color scheme. White (#FFFFFF) headline on dark background; cyan (#00CFCF) accent underline bar; sub-line in light gray (#CCCCCC).
Accent detail. Thin 3px cyan (#00CFCF) left-edge border on headline block; dark vignette in bottom-right quarter to push Jane forward.
Position. Top-center, oversized display fraction with sub-label below; sub-line "Open Source Matches GPT-5.5."
Font. Ultra-bold display sans-serif all-caps; "1/6TH" in cyan (#00CFCF) at 100px; "THE COST" in white (#FFFFFF) at 60px.
Color scheme. Cyan primary (#00CFCF) + white on charcoal; sub-line in mid-gray (#AAAAAA) at 28px.
Accent detail. Horizontal 2px cyan separator between fraction and sub-line; subtle drop shadow on text block for readability on varied backgrounds.
Position. Bottom-left, two-line stat callout block with small "SWE-bench Verified" eyebrow label above the numbers.
Font. Bold monospace-style numerals at 72px; "vs" separator in cyan (#00CFCF) at 48px; eyebrow in all-caps #888888 at 20px.
Color scheme. White (#FFFFFF) numbers on semi-transparent dark panel (#000000 at 65% opacity); cyan (#00CFCF) accent on separator.
Accent detail. Rounded-corner backing panel with 2px cyan border; attribution "V4-Pro vs Claude Opus 4.6" beneath numbers in #777777 at 18px.