OpenAI’s ChatGPT‑5.2 is Here and It Beats Human Experts on Most Knowledge Work Benchmarks

OpenAI has announced GPT‑5.2, which it describes as its “most capable model series yet for professional knowledge work.” The company says ChatGPT Enterprise users already report saving 40–60 minutes per day with AI, while heavy users save more than 10 hours per week. GPT‑5.2 is designed to expand this impact by improving spreadsheets, presentations, code, image understanding, long‑context tasks, tool use, and complex multi‑step projects.

Knowledge‑Work and Coding Benchmarks

OpenAI states that GPT‑5.2 sets a new state of the art on multiple benchmarks, including GDPval, which measures well‑specified knowledge‑work tasks across 44 occupations. GPT‑5.2 Thinking beats or ties top industry professionals in 70.9% of GDPval comparisons, while GPT‑5.2 Pro reaches 74.1%. OpenAI calls GPT‑5.2 Thinking its first model performing at or above human expert level on this benchmark and says it produces outputs more than 11 times faster and at under 1% of expert cost, based on historical metrics.

ALSO READ

You Can Now Use Adobe Photoshop and Acrobat Inside ChatGPT for Free

GDPval tasks include sales presentations, accounting spreadsheets, urgent‑care schedules, manufacturing diagrams, and short videos. A GDPval judge described one GPT‑5.2 output as “an exciting and noticeable leap in output quality” that “appears to have been done by a professional company with staff,” while still noting minor errors.

Ad Powered By Advergic
  Loading ad . . . 
 Ad - Continue scrolling to read

On internal junior investment‑banking spreadsheet tasks, GPT‑5.2 Thinking scores 68.4%, up from 59.1% for GPT‑5.1, while GPT‑5.2 Pro scores 71.7%. These tasks include three‑statement models and leveraged buyout models for take‑private deals.

For coding, GPT‑5.2 Thinking scores 55.6% on SWE‑Bench Pro, 80.0% on SWE‑bench Verified, and 74.6% on SWE‑Lancer IC Diamond, all above GPT‑5.1 Thinking. OpenAI says GPT‑5.2 more reliably debugs production code, implements feature requests, refactors large codebases, and ships end‑to‑end fixes, with stronger front‑end performance including complex and 3D interfaces.

From single prompts, GPT‑5.2 has produced an “Ocean Wave Simulation” app, a holiday card builder, and a typing‑rain game. Early testers such as Windsurf, Warp, JetBrains, Augment Code, Cline, Charlie Labs, Kilo, and Azad describe GPT‑5.2 as state‑of‑the‑art for “agentic” coding. Windsurf CEO Jeff Wang calls it “the biggest leap for GPT models in agentic coding since GPT‑5.”

Factuality, Long‑Context Reasoning and Vision

OpenAI says GPT‑5.2 Thinking hallucinates less than GPT‑5.1 Thinking. On de‑identified ChatGPT queries, answers with at least one error are 30% relatively less common. With search and maximum reasoning, GPT‑5.2 Thinking answers 93.9% of questions without errors, versus 91.2% for GPT‑5.1 Thinking; without search, it scores 88.0% versus 87.3%. OpenAI notes that GPT‑5.2 remains imperfect and urges double‑checking for critical work.

For long‑context reasoning, GPT‑5.2 Thinking sets a new state of the art on MRCRv2. On the “4‑needle” variant up to 256,000 tokens, OpenAI says it is the first model it has seen approach near‑100% accuracy and that it consistently outperforms GPT‑5.1 Thinking from 4K to 256K tokens. GPT‑5.2 Thinking also scores higher on long‑context BrowseComp and GraphWalks. OpenAI says this enables use on long reports, contracts, research papers, transcripts, and multi‑file projects, and it pairs GPT‑5.2 Thinking with a new /compact Responses endpoint to extend effective context.

Science, Reasoning and Rollout

On science and math tests, GPT‑5.2 Pro and GPT‑5.2 Thinking improve GPQA Diamond and FrontierMath scores over GPT‑5.1 Thinking. On abstract reasoning benchmarks ARC‑AGI‑1 and ARC‑AGI‑2, GPT‑5.2 Pro and GPT‑5.2 Thinking also post higher results, with GPT‑5.2 Pro crossing 90% on ARC‑AGI‑1.

OpenAI is rolling out GPT‑5.2 Instant, Thinking, and Pro to paid ChatGPT plans and the API, with GPT‑5.1 remaining as a legacy model for three months. The company says GPT‑5.2 is part of an ongoing push to improve general intelligence, long‑context understanding, tool use, vision, safety, and reliability.