OpenAI launches ChatGPT 5.5 with 82.7% accuracy, can handle tasks end-to-end

New Delhi: OpenAI on Thursday unveiled GPT-5.5, calling it its smartest and most intuitive model yet and claimed that it is the next step toward letting AI actually do the work, not just talk about it. Unlike earlier versions that needed careful step-by-step instructions, GPT-5.5 can take on messy, multi-part tasks from start to finish, according to the press release by the company.

“You can hand it a vague project and trust it to plan, use tools, check its own work, navigate ambiguity and keep going until it’s done. The model excels at writing and debugging code, researching online, analyzing data, building documents and spreadsheets, and even operating software across different apps,” the press release said.

The company said the biggest leap is in agentic coding and computer. On Terminal-Bench 2.0, which tests complex command-line workflows requiring planning and tool coordination, GPT-5.5 hits 82.7% accuracy — a new state-of-the-art. On SWE-Bench Pro, which evaluates real-world GitHub issue resolution, it scores 58.6%, solving more tasks end-to-end in a single pass than previous models. On OpenAI’s internal Expert-SWE benchmark for 20-hour coding projects, it also outperforms GPT-5.4. Crucially, it does all this while using fewer tokens, making it both more capable and more efficient. On Artificial Analysis’s Coding Index, GPT-5.5 delivers frontier-level intelligence at roughly half the cost of competing models.

The press release said GPT-5.5 was co-designed and served on NVIDIA GB200 and GB300 NVL72 systems, with Codex helping engineers test and optimize the stack itself. One key improvement: dynamic load balancing. Instead of splitting requests into fixed chunks, Codex analyzed weeks of production traffic to create smarter partitioning algorithms, boosting token generation speeds by over 20%.

For knowledge work, GPT-5.5 behaves more like a capable assistant than a chatbot. It’s better at finding information, extracting what matters, using tools and turning raw inputs into polished outputs. In Codex, it now generates higher-quality documents, spreadsheets and presentations. OpenAI’s own teams are already using it across finance, comms, marketing and product. The company said finance team used it to review 24,771 K-1 tax forms — 71,637 pages in total — cutting two weeks off the process. The comms team built a scoring framework for speaking requests and validated an automated Slack agent that now handles low-risk requests without human intervention.

In ChatGPT, GPT-5.5 Thinking delivers faster, more concise answers for complex problems, while GPT-5.5 Pro offers a noticeable step up in quality for demanding work in business, legal, education and data science. The model scores 84.9% on GDPval for multi-occupation knowledge work, 78.7% on OSWorld-Verified for operating real computer environments, and 98% on Tau2-bench Telecom for customer-service workflows without prompt tuning.

OpenAI says GPT-5.5 comes with its strongest safeguards yet, including tighter controls for high-risk cybersecurity requests and expanded testing with external red teamers. Cybersecurity and biology capabilities are classified as “High” under its Preparedness Framework, though not yet “Critical.” To balance access with safety, OpenAI is launching Trusted Access for Cyber, giving verified defenders expanded use of cyber-permissive models like GPT-5.4-Cyber for legitimate security work.

GPT-5.5 is rolling out now to Plus, Pro, Business and Enterprise users in ChatGPT and Codex, with GPT-5.5 Pro available to Pro, Business and Enterprise users. API access is coming soon, pending further safety and security reviews.

Hot topics

World

Business

Politics

Tech

Hot topics

World

Business

Politics

Tech

Topics

Related Articles

Categories

Latest

Newsletter