Video Script (9 min, 1,660 words)
SCENE ONE — THE DROPOpenAI just did it again. Six weeks after shipping GPT-5.4, the company released GPT-5.5 on Thursday, April 23, 2026. The internal codename, according to leaks that have been all but confirmed this week, was Spud. I am not making that up. The most consequential AI release of the quarter, and the engineers named it after a potato. That detail matters less than the release cadence. Six weeks. Let that sit with you. OpenAI is now shipping frontier models at a pace that was unimaginable eighteen months ago. The gap between major releases has compressed from years, to months, to weeks. GPT-5.5 is available right now inside ChatGPT for Plus, Pro, Business, and Enterprise subscribers. It is live in Codex, OpenAI's coding agent. The API is coming, quote, soon. The company trained and tested it with roughly two hundred early access partners before flipping the switch for the rest of us. What does it actually do? According to OpenAI, GPT-5.5 is their smartest and most intuitive model yet. Those are marketing words. Here is what the marketing words translate to in practice. The model has been rebuilt around agentic behavior. Not just answering your question. Taking action. Writing code, debugging code, running code, researching the web, analyzing data, filling out spreadsheets, operating software across multiple tools, and keeping going until the task is actually finished. Greg Brockman, the president of OpenAI, called it a new class of intelligence. He said it is, quote, a big step towards more agentic and intuitive computing. He also said, and I think THIS is the line worth remembering, it can look at an unclear problem and figure out what needs to happen next. That is the sales pitch in one sentence. Less hand holding. More autonomous execution. Here is the bigger strategic picture, because I think most people are going to miss it. Brockman and Sam Altman have been telegraphing a super app strategy for months. The idea is to merge ChatGPT, Codex, and their AI browser into a single unified service. One interface. One workflow. One place where you talk to the AI, it writes the code, it runs the browser, it operates the other software, all in one session. GPT-5.5 is the engine for that vision. Not a chatbot upgrade. An ATTEMPT to make OpenAI the operating system of everyday work. That is the bet. Now let us look at whether the numbers support it. SCENE TWO — THE NUMBERSThe benchmarks for GPT-5.5 are genuinely impressive. Some of them are field leading. Some of them are not. Let me give you the REAL picture. On Terminal-Bench 2.0, which tests whether a model can plan and execute complex command line workflows, GPT-5.5 scored 82.7 percent. That is state of the art. Anthropic's Claude Opus 4.7 scored 69.4 percent on the same benchmark. Google's Gemini 3.1 Pro scored 68.5 percent. This is the number OpenAI is going to put on every slide for the next three months, and they earned it. On OSWorld-Verified, the benchmark that measures whether a model can operate a real desktop computer, GPT-5.5 scored 78.7 percent. For context, GPT-5.4 scored 75 percent on the same test last month. The human baseline is 72.4 percent. The gap between AI and humans on this test is WIDENING every six weeks. On GDPval, which tests professional knowledge work across forty four occupations, the model scored 84.9 percent. On Tau2-bench Telecom, which simulates complex customer service workflows, it hit 98 percent without any prompt tuning. On OpenAI's internal Expert-SWE benchmark, which measures long horizon coding tasks that take a human engineer around twenty hours, GPT-5.5 outperformed GPT-5.4 by a meaningful margin. Now the bad news for OpenAI. On SWE-bench Pro, the benchmark that most serious software engineers actually care about, GPT-5.5 scored 58.6 percent. Claude Opus 4.7 is still leading this test at 64.3 percent. That is a 5.7 point gap. In the specific domain of real world GitHub issue resolution, Anthropic is still ahead. OpenAI is winning the agentic and computer use race. Anthropic is winning the raw coding race. If you care about which model actually closes pull requests in production, Claude is still your answer. Now the pricing. And this is where I need you to pay attention, because this is not a small number. GPT-5.5 costs five dollars per million input tokens on the API. Thirty dollars per million output tokens. That is exactly DOUBLE the price of GPT-5.4, which was two dollars and fifty cents in, and fifteen out. GPT-5.5 Pro, the premium tier, costs thirty dollars per million input tokens and ONE HUNDRED AND EIGHTY dollars per million output tokens. That is a twelve times markup over the standard GPT-5.5 rate. The context window stayed at one million tokens, which was the right call. OpenAI's justification is that the model is much more token efficient. Meaning, it uses fewer tokens to complete the same task. OpenAI claims the effective cost per completed job actually goes DOWN despite the higher per token rate. Independent developers have not verified that claim yet. If you are running production workloads against the API, your bill is going to get materially larger before you know whether the efficiency claim holds up in your specific use case. On the numbers alone, GPT-5.5 is a genuine step forward. The price increase is going to ANGER the developer community. The coding gap to Claude is going to anger enterprise buyers. The computer use lead is going to attract workflow automation companies. It is a MIXED picture, not a coronation. SCENE THREE — THE REAL STORYNow let me tell you what I actually think this release is about, because the benchmarks are the surface. OpenAI has a perception problem that predates GPT-5.5 by nine months. Every GPT-5 point release, starting with the original GPT-5 launch, has been followed by user backlash. The Reddit threads were brutal. The Change dot org petition demanding an older model back succeeded. Thousands of users called it a DOWNGRADE. GPT-5.2 users complained the safety refusals had become, quote, absurd. There is a real segment of the user base that believes OpenAI has been chasing benchmarks at the expense of the actual user experience. GPT-5.5 is, in my reading, OpenAI's answer to that criticism. It is less about claiming the smartest number on a leaderboard. It is about claiming the model you can actually trust with complex work without babysitting it. The marketing language, intuitive, messy tasks, keeps going until done, all points in one direction. They want to win back the users who felt burned by the last three releases. Whether they actually succeed will depend on things that are harder to measure. Is the censorship still cranked up past what most users want? Does it refuse reasonable requests? Does it gaslight you about what it just did? Those failure modes are where the prior releases lost trust, and no benchmark on earth captures them. There are two things worth flagging that are legitimately concerning. The first is the release pace. Six weeks between frontier model launches is unprecedented. OpenAI says they ran the model through external safety evaluations and about two hundred early access partners. Critics have pointed out that the window for proper red teaming on a model this capable is probably longer than six weeks. In internal GPT-5.2 testing, certain jailbreak strategies pushed attack success rates from around four percent at baseline to seventy eight percent after multi turn prompting. That is a gap that takes time to close, and time is the one thing the release cadence does not allow. The second is the super app strategy itself. Merging ChatGPT, Codex, and the browser into one product is a bigger concentration of user behavior than anything the tech industry has built in twenty years. If it works, it reshapes how people work. If it fails, it fails loudly. And if it works and stays closed and proprietary, the monopoly questions that the White House just addressed in its National AI Framework are going to come back, HARDER, fast. There is also one rumor making the rounds that I want to flag without endorsing. Some observers have speculated that the Spud codename is OpenAI trolling its own rushed release process, calling the model a common potato in contrast to the gourmet positioning Anthropic gives its Claude releases. Others have read it as a confidence signal. Nothing fancy, just a staple. No confirmation either way from OpenAI. I mention it because the internet has decided the codename matters, and it keeps coming up in coverage. So where does this leave us? GPT-5.5 is real. The benchmarks are real. The price hike is real. The coding gap to Claude is real. The computer use dominance is real. The super app ambition is real. And the trust deficit OpenAI has with its own power users is real. If you are building an agent that operates across software tools, GPT-5.5 is probably your new default. If you are shipping production code against a single model API, you should run Claude Opus 4.7 in parallel and see which one actually resolves your issues. If you are just using ChatGPT day to day, the honest answer is wait a week and see how the user feedback lands, because OpenAI's track record on point releases is not great. The AI race is now being run in six week sprints. Pay attention to who is winning which lap, and who is running out of breath. Stay sharp. Jane Sterling, Sterling Intelligence.
Annotated Script (with b-roll & cut cues)
SCENE ONE — THE DROPOpenAI just did it again. Six weeks after shipping GPT-5.4, the company released GPT-5.5 on Thursday, April 23, 2026. The internal codename, according to leaks that have been all but confirmed this week, was Spud. I am not making that up. The most consequential AI release of the quarter, and the engineers named it after a potato. That detail matters less than the release cadence. Six weeks. Let that sit with you. OpenAI is now shipping frontier models at a pace that was unimaginable eighteen months ago. The gap between major releases has compressed from years, to months, to weeks. GPT-5.5 is available right now inside ChatGPT for Plus, Pro, Business, and Enterprise subscribers. It is live in Codex, OpenAI's coding agent. The API is coming, quote, soon. The company trained and tested it with roughly two hundred early access partners before flipping the switch for the rest of us. What does it actually do? According to OpenAI, GPT-5.5 is their smartest and most intuitive model yet. Those are marketing words. Here is what the marketing words translate to in practice. The model has been rebuilt around agentic behavior. Not just answering your question. Taking action. Writing code, debugging code, running code, researching the web, analyzing data, filling out spreadsheets, operating software across multiple tools, and keeping going until the task is actually finished. Greg Brockman, the president of OpenAI, called it a new class of intelligence. He said it is, quote, a big step towards more agentic and intuitive computing. He also said, and I think THIS is the line worth remembering, it can look at an unclear problem and figure out what needs to happen next. That is the sales pitch in one sentence. Less hand holding. More autonomous execution. Here is the bigger strategic picture, because I think most people are going to miss it. Brockman and Sam Altman have been telegraphing a super app strategy for months. The idea is to merge ChatGPT, Codex, and their AI browser into a single unified service. One interface. One workflow. One place where you talk to the AI, it writes the code, it runs the browser, it operates the other software, all in one session. GPT-5.5 is the engine for that vision. Not a chatbot upgrade. An ATTEMPT to make OpenAI the operating system of everyday work. That is the bet. Now let us look at whether the numbers support it. SCENE TWO — THE NUMBERSThe benchmarks for GPT-5.5 are genuinely impressive. Some of them are field leading. Some of them are not. Let me give you the REAL picture. On Terminal-Bench 2.0, which tests whether a model can plan and execute complex command line workflows, GPT-5.5 scored 82.7 percent. That is state of the art. Anthropic's Claude Opus 4.7 scored 69.4 percent on the same benchmark. Google's Gemini 3.1 Pro scored 68.5 percent. This is the number OpenAI is going to put on every slide for the next three months, and they earned it. On OSWorld-Verified, the benchmark that measures whether a model can operate a real desktop computer, GPT-5.5 scored 78.7 percent. For context, GPT-5.4 scored 75 percent on the same test last month. The human baseline is 72.4 percent. The gap between AI and humans on this test is WIDENING every six weeks. On GDPval, which tests professional knowledge work across forty four occupations, the model scored 84.9 percent. On Tau2-bench Telecom, which simulates complex customer service workflows, it hit 98 percent without any prompt tuning. On OpenAI's internal Expert-SWE benchmark, which measures long horizon coding tasks that take a human engineer around twenty hours, GPT-5.5 outperformed GPT-5.4 by a meaningful margin. Now the bad news for OpenAI. On SWE-bench Pro, the benchmark that most serious software engineers actually care about, GPT-5.5 scored 58.6 percent. Claude Opus 4.7 is still leading this test at 64.3 percent. That is a 5.7 point gap. In the specific domain of real world GitHub issue resolution, Anthropic is still ahead. OpenAI is winning the agentic and computer use race. Anthropic is winning the raw coding race. If you care about which model actually closes pull requests in production, Claude is still your answer. Now the pricing. And this is where I need you to pay attention, because this is not a small number. GPT-5.5 costs five dollars per million input tokens on the API. Thirty dollars per million output tokens. That is exactly DOUBLE the price of GPT-5.4, which was two dollars and fifty cents in, and fifteen out. GPT-5.5 Pro, the premium tier, costs thirty dollars per million input tokens and ONE HUNDRED AND EIGHTY dollars per million output tokens. That is a twelve times markup over the standard GPT-5.5 rate. The context window stayed at one million tokens, which was the right call. OpenAI's justification is that the model is much more token efficient. Meaning, it uses fewer tokens to complete the same task. OpenAI claims the effective cost per completed job actually goes DOWN despite the higher per token rate. Independent developers have not verified that claim yet. If you are running production workloads against the API, your bill is going to get materially larger before you know whether the efficiency claim holds up in your specific use case. On the numbers alone, GPT-5.5 is a genuine step forward. The price increase is going to ANGER the developer community. The coding gap to Claude is going to anger enterprise buyers. The computer use lead is going to attract workflow automation companies. It is a MIXED picture, not a coronation. SCENE THREE — THE REAL STORYNow let me tell you what I actually think this release is about, because the benchmarks are the surface. OpenAI has a perception problem that predates GPT-5.5 by nine months. Every GPT-5 point release, starting with the original GPT-5 launch, has been followed by user backlash. The Reddit threads were brutal. The Change dot org petition demanding an older model back succeeded. Thousands of users called it a DOWNGRADE. GPT-5.2 users complained the safety refusals had become, quote, absurd. There is a real segment of the user base that believes OpenAI has been chasing benchmarks at the expense of the actual user experience. GPT-5.5 is, in my reading, OpenAI's answer to that criticism. It is less about claiming the smartest number on a leaderboard. It is about claiming the model you can actually trust with complex work without babysitting it. The marketing language, intuitive, messy tasks, keeps going until done, all points in one direction. They want to win back the users who felt burned by the last three releases. Whether they actually succeed will depend on things that are harder to measure. Is the censorship still cranked up past what most users want? Does it refuse reasonable requests? Does it gaslight you about what it just did? Those failure modes are where the prior releases lost trust, and no benchmark on earth captures them. There are two things worth flagging that are legitimately concerning. The first is the release pace. Six weeks between frontier model launches is unprecedented. OpenAI says they ran the model through external safety evaluations and about two hundred early access partners. Critics have pointed out that the window for proper red teaming on a model this capable is probably longer than six weeks. In internal GPT-5.2 testing, certain jailbreak strategies pushed attack success rates from around four percent at baseline to seventy eight percent after multi turn prompting. That is a gap that takes time to close, and time is the one thing the release cadence does not allow. The second is the super app strategy itself. Merging ChatGPT, Codex, and the browser into one product is a bigger concentration of user behavior than anything the tech industry has built in twenty years. If it works, it reshapes how people work. If it fails, it fails loudly. And if it works and stays closed and proprietary, the monopoly questions that the White House just addressed in its National AI Framework are going to come back, HARDER, fast. There is also one rumor making the rounds that I want to flag without endorsing. Some observers have speculated that the Spud codename is OpenAI trolling its own rushed release process, calling the model a common potato in contrast to the gourmet positioning Anthropic gives its Claude releases. Others have read it as a confidence signal. Nothing fancy, just a staple. No confirmation either way from OpenAI. I mention it because the internet has decided the codename matters, and it keeps coming up in coverage. So where does this leave us? GPT-5.5 is real. The benchmarks are real. The price hike is real. The coding gap to Claude is real. The computer use dominance is real. The super app ambition is real. And the trust deficit OpenAI has with its own power users is real. If you are building an agent that operates across software tools, GPT-5.5 is probably your new default. If you are shipping production code against a single model API, you should run Claude Opus 4.7 in parallel and see which one actually resolves your issues. If you are just using ChatGPT day to day, the honest answer is wait a week and see how the user feedback lands, because OpenAI's track record on point releases is not great. The AI race is now being run in six week sprints. Pay attention to who is winning which lap, and who is running out of breath. Stay sharp. Jane Sterling, Sterling Intelligence.

YouTube Description

OpenAI just dropped GPT-5.5 — and they DOUBLED the API price to do it. We break down what actually changed, who wins, who loses, and whether the benchmarks justify the bill. GPT-5.5 (codename: Spud) launched April 23, 2026 — just six weeks after GPT-5.4. OpenAI president Greg Brockman is calling it "a new class of intelligence" and "a big step towards more agentic and intuitive computing." It is live in ChatGPT Plus, Pro, Business, and Enterprise. It is live in Codex. API access is coming soon. The benchmark headlines: 82.7% on Terminal-Bench 2.0 (state of the art, beating Claude Opus 4.7's 69.4% and Gemini 3.1 Pro's 68.5%). 78.7% on OSWorld-Verified, up from 75% on GPT-5.4. 84.9% on GDPval. 98.0% on Tau2-bench Telecom. OpenAI is clearly winning the agentic coding and computer-use race. But here's the part OpenAI will not put on its slides: on SWE-bench Pro — the benchmark for real-world GitHub issue resolution — GPT-5.5 scored 58.6%. Anthropic's Claude Opus 4.7 still leads that benchmark at 64.3%. If you are shipping production code, Claude is still your answer. The pricing: $5 per million input tokens, $30 per million output tokens. Exactly 2x GPT-5.4. GPT-5.5 Pro goes to $30 in, $180 out. The context window stayed at 1M tokens. In this episode, Jane Sterling breaks down the release, the real numbers behind it, the competitive landscape, the strategic "super app" play, the backlash risk from prior GPT-5 point releases, and what all of it means for developers, enterprises, and everyday ChatGPT users. ⏱ Timestamps 00:00 Scene One — The Drop 03:00 Scene Two — The Numbers 06:00 Scene Three — The Real Story 🔔 Subscribe to Sterling Intelligence for weekly AI coverage that cuts through the hype. https://www.youtube.com/@SterlingIntelligence No hype. No filler. Just the signal. — Jane Sterling, Sterling Intelligence #GPT55 #OpenAI #ChatGPT #AINews #AgenticAI #ClaudeOpus #SuperApp #AIWeekly #SterlingIntelligence #JaneSterling #AIRace #GPT5 #AIBenchmarks #OpenAIAPI #AIPricing

Titles

Keywords

GPT-5.5, GPT 5.5, ChatGPT, OpenAI, Sam Altman, Greg Brockman, GPT-5.5 Pro, Spud, Codex, AI super app, agentic AI, Terminal-Bench, OSWorld, GDPval, SWE-bench Pro, Claude Opus 4.7, Gemini 3.1 Pro, Anthropic, AI benchmarks, AI pricing, AI coding, computer use AI, AI agents, AI race, AI news 2026, Sterling Intelligence, Jane Sterling, AI weekly, OpenAI API, artificial intelligence

Thumbnail Brief

Jane's Appearance & Framing

Expression. Shocked-deadpan. Eyes slightly widened, mouth closed but corners turned down in a "can you believe this" micro-expression. Not smiling. Not laughing. The face you make when someone shows you a number that does not seem real.

Head position. Slightly turned to her right (viewer's left), chin very slightly down, leaning subtly forward toward camera. Creates tension and forward energy.

Wardrobe. Dark blazer or structured jacket, minimalist. Keep consistent with existing Sterling Intelligence brand — no loud colors, no jewelry that catches light. The face and the text do the work.

Eye direction. Direct to camera (at the viewer). Confrontational, not friendly. Alternate take: eyes flicked sharply toward the overlay text — sells the "look at this" read.

Lighting. Dramatic key light from upper-left, soft fill on the right, deep shadow on one side of the face. Color temp around 4800–5000K, subtle warm tint on skin. Slight rim light on hair/shoulder from behind to pop her off the background.

Scene setup. Dark charcoal/black background with a subtle teal-green glow in the far background (references the OpenAI brand palette without shouting it). Shallow depth of field — Jane tack-sharp, background blurred. Optional: a ghosted GPT-5.5 "model card" graphic or dollar-sign motif floating behind her right shoulder at ~20% opacity.

Option 1 — Best (Price Angle)
DOUBLE THE PRICE

Position. Bottom-left third of the frame, large block, stacked on two lines ("DOUBLE" on top, "THE PRICE" below).

Font. Bebas Neue Bold or Impact — heavy condensed sans-serif, all caps, tight tracking.

Color scheme. "DOUBLE" in bright red (#dc2626), 110% of base size, slight outer glow. "THE PRICE" in pure white (#ffffff), 80% of "DOUBLE" size. 3px black stroke around both blocks for legibility over any background.

Accent detail. Small gold tag below: "$5 / MILLION INPUT" in Inter Bold, 18px, #c8a84b gold. Acts as the proof stat that backs up the shock claim.

Option 2 — Release Cadence Angle
SIX WEEKS

Position. Centered upper-third, slightly left-of-center. Large.

Font. Inter Black or Montserrat Black, all caps.

Color scheme. "SIX" in gold (#c8a84b), "WEEKS" in white. Red underline under "WEEKS" at 4px height. 2px black stroke throughout.

Accent detail. Smaller subtitle below in white: "GPT-5.4 → GPT-5.5" with arrow in red. Communicates the compressed release gap as the story.

Option 3 — Computer-Use Angle
78.7% vs 72.4%

Position. Right side, stacked scoreboard style — "AI: 78.7%" over "HUMAN: 72.4%".

Font. JetBrains Mono Bold for the numbers (monospace reads as "data" / "benchmark"), Inter Bold for the labels.

Color scheme. "AI" row in gold on dark, "HUMAN" row in muted gray on dark. Numbers in pure white with a subtle red glow on the AI row to signal the winner.

Accent detail. Header strip across the top: "OSWORLD-VERIFIED" in small caps, 11px, gold. Makes it read as a credible data card, not a clickbait claim.

Sources & References

Official — OpenAI

Media Coverage

Analyst & Independent

Context — Prior Release Reception