Microsoft’s AI Agents Fail at Basic Shopping Tasks in Marketplace Test

Microsoft has put hundreds of artificial intelligence agents to the test in a simulated economy, revealing that these systems still fall short of handling tasks that most consumers manage every day. The company’s findings suggest that fully autonomous AI shopping assistants remain unreliable, despite growing industry interest in hands-free commerce.

Simulated Marketplace Test

In partnership with Arizona State University, Microsoft created the Magentic Marketplace, where 100 customer-side AI agents interacted with 300 business-side agents in realistic shopping scenarios such as ordering dinner. The AI systems faced a range of challenges, including sorting through 100 search results at a time.

Rather than comparing all available options, the agents typically settled for the first satisfactory result, causing their “welfare scores”, a measure of usefulness, to drop sharply. Researchers labeled this pattern as “first-proposal bias,” where the speed of the agent’s response outweighed the actual quality of the choice.

Ad Powered By Advergic
  Loading ad . . . 
 Ad - Continue scrolling to read

Vulnerable to Manipulation

Microsoft’s research also revealed that current AI agents are vulnerable to various manipulation techniques. The team tested six strategies, from psychological tactics like fake credentials and social proof to more technical prompt injection attacks. Both OpenAI’s GPT-4o and the open-source GPTOSS-20b failed to prevent payments from being redirected to malicious agents in every scenario. Alibaba’s Qwen3-4b was susceptible to basic persuasion techniques. Only Anthropic’s Claude Sonnet 4 was able to resist these forms of manipulation.

Can’t Work in Teams

When asked to work together toward shared goals, many AI agents could not identify their roles or coordinate their actions effectively. Performance improved with step-by-step human instructions, but this level of oversight undermines the idea of fully autonomous systems. Microsoft’s team concluded that AI agents should support human decisions rather than replace them. They recommend supervised autonomy, where agents assist with tasks but humans maintain final control and review each recommendation.

These results come as OpenAI, Anthropic, and other technology companies race to deploy autonomous shopping assistants designed to operate independently on retail platforms. Microsoft’s findings indicate that such systems are not yet ready for widespread use without human supervision.

Perplexity vs Amazon

Concerns about irresponsible AI agent behavior have led to tension between AI companies and major retailers. Amazon recently sent a cease-and-desist letter to Perplexity AI, demanding that its Comet browser stop operating on Amazon’s website, citing violations of its terms and a degraded customer experience. Perplexity defended its technology, saying consumers should be allowed to choose their own digital assistants.