OpenAI says its new GPT-5.5 model is more efficient and better at coding

OpenAI Launches GPT-5.5: A New Benchmark for Agentic Coding and Autonomous Work

On April 23, 2026, OpenAI unveiled GPT-5.5, describing it as the company’s “smartest and most intuitive to use model yet.” The release marks a significant leap forward in agentic AI — systems that don’t just answer questions but actively plan, execute, and verify multi-step tasks across coding, research, and knowledge work. Available immediately to ChatGPT Plus, Pro, Business, and Enterprise subscribers, GPT-5.5 represents OpenAI’s clearest signal that the future of AI is not in conversation but in autonomous action.

What Is GPT-5.5?

GPT-5.5 is designed to take messy, ambiguous, multi-part instructions and carry them through to completion without requiring the user to micromanage each step. According to OpenAI’s official announcement, the model “understands what you’re trying to do faster and can carry more of the work itself.” Its core capabilities include:

Article illustration
  • Writing and debugging code across large, complex codebases
  • Researching online, synthesizing findings, and checking its own work
  • Analyzing data, creating documents and spreadsheets autonomously
  • Operating software interfaces through computer-use capabilities
  • Moving across multiple tools until a task is fully finished

Unlike previous models that required careful step-by-step prompting, GPT-5.5 is built to plan, iterate, navigate ambiguity, and persist through long-horizon tasks that previously demanded sustained human oversight.

Benchmark Performance: How GPT-5.5 Compares to the Competition

OpenAI published an extensive benchmark comparison pitting GPT-5.5 against GPT-5.4, Anthropic’s Claude Opus 4.7, and Google’s Gemini 3.1 Pro. The results show meaningful gains across nearly every category.

On Terminal-Bench 2.0, which evaluates complex command-line workflows requiring planning, iteration, and tool coordination, GPT-5.5 achieved a state-of-the-art score of 82.7% — significantly ahead of GPT-5.4’s 75.1%, Claude Opus 4.7’s 69.4%, and Gemini 3.1 Pro’s 68.5%.

On SWE-Bench Pro, the standard benchmark for real-world GitHub issue resolution, GPT-5.5 scored 58.6%, edging past GPT-5.4 (57.7%) and Gemini 3.1 Pro (54.2%), though Claude Opus 4.7 still leads at 64.3% on this particular metric.

On OpenAI’s internal Expert-SWE benchmark — designed to evaluate long-horizon coding tasks with a median estimated human completion time of 20 hours — GPT-5.5 reached 73.1%, up from 68.5% for GPT-5.4.

Across all three coding evaluations, GPT-5.5 improves on GPT-5.4’s scores while using fewer tokens to reach them.

In knowledge work, GPT-5.5 scored 84.9% on GDPval (a benchmark testing agents’ ability to produce well-specified work across 44 occupations), 78.7% on OSWorld-Verified (measuring autonomous computer operation), and an impressive 98.0% on Tau2-bench Telecom for complex customer-service workflows — all without prompt tuning.

On scientific benchmarks, the model achieved 25.0% on GeneBench (multi-stage genetic data analysis, up from 19.0% for GPT-5.4) and 80.5% on BixBench (real-world bioinformatics tasks). On abstract reasoning, GPT-5.5 scored 85.0% on ARC-AGI-2, a substantial improvement from GPT-5.4’s 73.3%.

The GPT-5.5 Pro variant pushes further, achieving 90.1% on BrowseComp (deep research), 52.4% on FrontierMath Tiers 1–3, and 39.6% on the notoriously difficult FrontierMath Tier 4.

Efficiency: More Capable Without the Speed Tax

One of the most notable aspects of GPT-5.5 is that it achieves these gains without the performance penalty typically associated with larger, more capable models. OpenAI stated that GPT-5.5 matches GPT-5.4’s per-token latency in real-world serving conditions, while simultaneously using significantly fewer tokens to complete the same tasks in Codex.

On Artificial Analysis’s Coding Index, GPT-5.5 delivers state-of-the-art intelligence at approximately half the cost of competitive frontier coding models, according to OpenAI’s analysis.

Behind the scenes, this efficiency came from a systematic redesign of the inference stack. GPT-5.5 was co-designed for, trained with, and served on NVIDIA GB200 and GB300 NVL72 systems. One critical optimization involved load balancing: Codex analyzed weeks of production traffic patterns and wrote custom heuristic algorithms to optimally partition work across GPU cores, increasing token generation speeds by over 20%. In a twist that underscores the model’s self-improving nature, GPT-5.5 itself helped identify and implement key improvements in the infrastructure that serves it.

Real-World Impact: What Early Testers Are Saying

OpenAI collected feedback from nearly 200 trusted early-access partners before the public release. The responses highlight a consistent theme: GPT-5.5 doesn’t just perform better on benchmarks — it changes how engineers and professionals approach their work.

Dan Shipper, founder and CEO of Every, called GPT-5.5 “the first coding model I’ve used that has serious conceptual clarity.” He tested the model by asking it to diagnose and fix a broken application — a task where GPT-5.4 failed but GPT-5.5 produced the same kind of architectural rewrite that one of his best engineers eventually arrived at after days of debugging.

Michael Truell, co-founder and CEO of Cursor, reported that “GPT-5.5 is noticeably smarter and more persistent than GPT-5.4, with stronger coding performance and more reliable tool use. It stays on task for significantly longer without stopping early, which matters most for the complex, long-running work our users delegate to Cursor.”

Pietro Schirano, CEO of MagicPath, described a scenario where GPT-5.5 resolved a complex merge conflict between two heavily changed branches — hundreds of frontend and refactor changes — in a single pass in approximately 20 minutes.

Justin Boitano, VP of Enterprise AI at NVIDIA, said the model enables teams to “ship end-to-end features from natural language prompts, cut debug time from days to hours, and turn weeks of experimentation into overnight progress in complex codebases.”

One engineer at NVIDIA went further, stating that “losing access to GPT-5.5 feels like I’ve had a limb amputated” — a striking endorsement of how quickly the model has become indispensable to their workflow.

Scientific Research: From Code Assistant to Co-Scientist

Beyond software engineering, GPT-5.5 shows meaningful advances in scientific research workflows. The model can now explore ideas, gather evidence, test assumptions, interpret results, and decide what to try next — a full research loop that previously required human direction at every stage.

In perhaps the most remarkable example, an internal version of GPT-5.5 with a custom harness discovered a new proof about Ramsey numbers — a central topic in combinatorics — that was later formally verified in Lean. Results in this area are rare and technically difficult, making this a concrete example of AI contributing original mathematical reasoning, not just code or explanation.

Bartosz Naskręcki, assistant professor of mathematics at Adam Mickiewicz University in Poland, used GPT-5.5 in Codex to build an algebraic geometry visualization app from a single prompt in 11 minutes, complete with Riemann-Roch theorem computations and interactive 3D rendering.

In biomedical research, Derya Unutmaz, an immunology professor at the Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a gene-expression dataset with 62 samples and nearly 28,000 genes, producing a detailed research report that would have taken his team months to complete manually.

Availability, Pricing, and the API Roadmap

GPT-5.5 is currently available through two primary channels:

  • ChatGPT: GPT-5.5 Thinking is available to Plus, Pro, Business, and Enterprise users. GPT-5.5 Pro is available to Pro, Business, and Enterprise users.
  • Codex: GPT-5.5 is available for Plus, Pro, Business, Enterprise, Edu, and Go plan subscribers with a 400K context window. It also offers a Fast mode that generates tokens 1.5x faster for 2.5x the cost.

API access is coming soon. The pricing structure is as follows:

  • gpt-5.5: $5 per 1M input tokens, $30 per 1M output tokens, with a 1M context window
  • gpt-5.5-pro: $30 per 1M input tokens, $180 per 1M output tokens
  • Batch and Flex pricing available at half the standard rate
  • Priority processing available at 2.5x the standard rate

While GPT-5.5 is priced higher than GPT-5.4 on a per-token basis, OpenAI emphasized that its superior token efficiency means most users will see better results with fewer tokens consumed.

Safety: Stricter Cybersecurity Safeguards

OpenAI acknowledged that GPT-5.5 represents a step up in cybersecurity capabilities compared to GPT-5.4 and has deployed what it calls “industry-leading safeguards” for this level of capability. The company expanded its Trusted Access for Cyber program, allowing verified defenders to access GPT-5.5’s advanced cybersecurity capabilities with fewer restrictions. Organizations defending critical infrastructure can apply for access to cyber-permissive models like GPT-5.4-Cyber under strict security requirements.

Under OpenAI’s Preparedness Framework, GPT-5.5’s biological/chemical and cybersecurity capabilities are classified as “High.” While the model did not reach the “Critical” threshold for cybersecurity, the company has deployed tighter controls around higher-risk activities and added protections against repeated misuse.

What This Means for the AI Industry

GPT-5.5’s release intensifies the competition between OpenAI and Anthropic, which recently launched its own Claude Cowork agent and Claude Opus 4.7 model. On Terminal-Bench 2.0, GPT-5.5 (82.7%) significantly outperforms Claude Opus 4.7 (69.4%), though Claude remains competitive on SWE-Bench Pro (64.3% vs. 58.6%). The gap is closing in some areas while widening in others, suggesting that the frontier model race remains highly dynamic.

Perhaps more importantly, GPT-5.5 signals a strategic shift: OpenAI is no longer optimizing primarily for conversational quality but for agentic capability — the ability to plan, execute, verify, and persist through complex tasks with minimal human intervention. This positions GPT-5.5 as the foundation for what OpenAI envisions as a unified AI “super app” experience, where a single model handles everything from coding to scientific research to enterprise workflows.

Key Takeaways

  • GPT-5.5 achieves state-of-the-art performance on Terminal-Bench 2.0 (82.7%), GDPval (84.9%), and Tau2-bench Telecom (98.0%)
  • It matches GPT-5.4’s latency while using fewer tokens — making it both more capable and more cost-efficient
  • Real-world testers report significant improvements in code understanding, long-horizon task persistence, and autonomous tool use
  • The model has demonstrated original mathematical reasoning, including a new proof in Ramsey theory
  • API access is coming soon, with gpt-5.5 priced at $5/1M input tokens and $30/1M output tokens
  • Stricter cybersecurity safeguards are in place, with expanded trusted access for verified defenders
  • Co-designed and served on NVIDIA GB200/GB300 infrastructure, with 20%+ speed gains from AI-optimized load balancing

As AI transitions from a tool that responds to prompts to an agent that autonomously executes work, GPT-5.5 represents one of the clearest milestones yet in that journey. Whether it maintains its benchmark lead as competitors respond remains to be seen — but for now, OpenAI has set a new standard for what agentic AI can accomplish.

📖 Related: OpenAI Now Lets Teams Build Custom AI Bots That Work Autonomously

📖 Related: Anthropic Just Launched Cowork — And It Might Be the AI Agent That Actually Works

📖 Related: Google Meet’s AI Note-Taker Is Breaking Out of the Virtual World

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *