OpenAI has announced GPT‑5.2, which it describes as its “most capable model series yet for professional knowledge work.” The company says ChatGPT Enterprise users already report saving 40–60 minutes per day with AI, while heavy users save more than 10 hours per week. GPT‑5.2 is designed to expand this impact by improving spreadsheets, presentations, code, image understanding, long‑context tasks, tool use, and complex multi‑step projects.
OpenAI states that GPT‑5.2 sets a new state of the art on multiple benchmarks, including GDPval, which measures well‑specified knowledge‑work tasks across 44 occupations. GPT‑5.2 Thinking beats or ties top industry professionals in 70.9% of GDPval comparisons, while GPT‑5.2 Pro reaches 74.1%. OpenAI calls GPT‑5.2 Thinking its first model performing at or above human expert level on this benchmark and says it produces outputs more than 11 times faster and at under 1% of expert cost, based on historical metrics.
GDPval tasks include sales presentations, accounting spreadsheets, urgent‑care schedules, manufacturing diagrams, and short videos. A GDPval judge described one GPT‑5.2 output as “an exciting and noticeable leap in output quality” that “appears to have been done by a professional company with staff,” while still noting minor errors.
On internal junior investment‑banking spreadsheet tasks, GPT‑5.2 Thinking scores 68.4%, up from 59.1% for GPT‑5.1, while GPT‑5.2 Pro scores 71.7%. These tasks include three‑statement models and leveraged buyout models for take‑private deals.
For coding, GPT‑5.2 Thinking scores 55.6% on SWE‑Bench Pro, 80.0% on SWE‑bench Verified, and 74.6% on SWE‑Lancer IC Diamond, all above GPT‑5.1 Thinking. OpenAI says GPT‑5.2 more reliably debugs production code, implements feature requests, refactors large codebases, and ships end‑to‑end fixes, with stronger front‑end performance including complex and 3D interfaces.
From single prompts, GPT‑5.2 has produced an “Ocean Wave Simulation” app, a holiday card builder, and a typing‑rain game. Early testers such as Windsurf, Warp, JetBrains, Augment Code, Cline, Charlie Labs, Kilo, and Azad describe GPT‑5.2 as state‑of‑the‑art for “agentic” coding. Windsurf CEO Jeff Wang calls it “the biggest leap for GPT models in agentic coding since GPT‑5.”
OpenAI says GPT‑5.2 Thinking hallucinates less than GPT‑5.1 Thinking. On de‑identified ChatGPT queries, answers with at least one error are 30% relatively less common. With search and maximum reasoning, GPT‑5.2 Thinking answers 93.9% of questions without errors, versus 91.2% for GPT‑5.1 Thinking; without search, it scores 88.0% versus 87.3%. OpenAI notes that GPT‑5.2 remains imperfect and urges double‑checking for critical work.
For long‑context reasoning, GPT‑5.2 Thinking sets a new state of the art on MRCRv2. On the “4‑needle” variant up to 256,000 tokens, OpenAI says it is the first model it has seen approach near‑100% accuracy and that it consistently outperforms GPT‑5.1 Thinking from 4K to 256K tokens. GPT‑5.2 Thinking also scores higher on long‑context BrowseComp and GraphWalks. OpenAI says this enables use on long reports, contracts, research papers, transcripts, and multi‑file projects, and it pairs GPT‑5.2 Thinking with a new /compact Responses endpoint to extend effective context.
OpenAI describes GPT‑5.2 Thinking as its strongest vision model so far, citing higher scores than GPT‑5.1 Thinking on CharXiv Reasoning and ScreenSpot‑Pro, and better spatial understanding in image examples. On tool‑use benchmarks such as τ2‑bench Telecom and Retail, BrowseComp, Scale MCP‑Atlas, and Toolathlon, GPT‑5.2 Thinking and GPT‑5.2 Pro also outperform GPT‑5.1 Thinking.
On science and math tests, GPT‑5.2 Pro and GPT‑5.2 Thinking improve GPQA Diamond and FrontierMath scores over GPT‑5.1 Thinking. On abstract reasoning benchmarks ARC‑AGI‑1 and ARC‑AGI‑2, GPT‑5.2 Pro and GPT‑5.2 Thinking also post higher results, with GPT‑5.2 Pro crossing 90% on ARC‑AGI‑1.
OpenAI is rolling out GPT‑5.2 Instant, Thinking, and Pro to paid ChatGPT plans and the API, with GPT‑5.1 remaining as a legacy model for three months. The company says GPT‑5.2 is part of an ongoing push to improve general intelligence, long‑context understanding, tool use, vision, safety, and reliability.
Get the latest tech news, telecom insights, and product launches wherever you prefer.
Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.