AI Still Lags Behind Human Workers in This Key Skill: OpenAI Researchers

OpenAI researchers have acknowledged that even the most advanced AI models are still unable to outperform human coders. This directly contradicts CEO Sam Altman’s claim that AI could replace “low-level” software engineers by the end of this year.

AI Benchmarking for Coding Tasks

In a newly published study, OpenAI found that its cutting-edge AI models struggle with complex coding tasks and remain unreliable for real-world software development.

The researchers used a newly developed benchmark called SWE-Lancer, which they built on over 1,400 software engineering tasks sourced from the freelancer platform Upwork. The study evaluated OpenAI’s o1 reasoning model and GPT-4o, alongside competitor Anthropic’s Claude 3.5 Sonnet, to assess their ability to handle coding challenges.

Developers assigned these AI models tasks that required them to either fix bugs and implement changes or manage higher-level decision-making processes in software engineering. However, unlike human engineers, the AI models lacked internet access, preventing them from retrieving answers from existing online sources.

AI Struggles With Software Engineering

Despite being given tasks valued at hundreds of thousands of dollars, the AI models consistently failed to detect the root causes of bugs in larger projects. Instead, they could only address surface-level issues without fully grasping the broader software structure.

Although the AI models demonstrated the ability to work significantly faster than human coders, their solutions were often incorrect or incomplete. Claude 3.5 Sonnet performed better than OpenAI’s offerings among the tested models, yet its responses remained largely inaccurate. The study concluded that these models lack the reliability for real-world coding applications.

AI is Not Ready to Replace Human Engineers

The findings reaffirm that while AI models have made remarkable progress, they are still far from being a viable replacement for human coders. Their inability to manage complex software tasks shows that more like assistants rather than full-blown human engineers.



Get Alerts

ProPakistani Community

Join the groups below to get latest news and updates.



>