Close Menu
economyuae.comeconomyuae.com
    What's Hot

    Dubai’s GDP grows 4% in Q1 2025 led by health, real estate

    August 14, 2025

    Global Health Exhibition 2025 expands footprint amid rising global interest

    August 14, 2025

    Saudi’s PIF grows assets under management to $913bn in 2024

    August 14, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    economyuae.comeconomyuae.com
    Subscribe
    • Home
    • MARKET
    • STARTUPS
    • BUSINESS
    • ECONOMY
    • INTERVIEWS
    • MAGAZINE
    economyuae.comeconomyuae.com
    Home » AI ‘vibe managers’ have yet to find their groove
    Company 

    AI ‘vibe managers’ have yet to find their groove

    Arabian Media staffBy Arabian Media staffJuly 10, 2025No Comments4 Mins Read
    Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Stay informed with free updates

    Simply sign up to the Technology myFT Digest — delivered directly to your inbox.

    Techworld is abuzz with how artificial intelligence agents are going to augment, if not replace, humans in the workplace. But the present-day reality of agentic AI falls well short of the future promise. What happened when the research lab Anthropic prompted an AI agent to run a simple automated shop? It lost money, hallucinated a fictitious bank account and underwent an “identity crisis”. The world’s shopkeepers can rest easy — at least for now.

    Anthropic has developed some of the world’s most capable generative AI models, helping to fuel the latest tech investment frenzy. To its credit, the company has also exposed its models’ limitations by stress-testing their real-world applications. In a recent experiment, called Project Vend, Anthropic partnered with the AI safety company Andon Labs to run a vending machine at its San Francisco headquarters. The month-long experiment highlighted a co-created world that was “more curious than we could have expected”.

    The researchers instructed their shopkeeping agent, nicknamed Claudius, to stock 10 products. Powered by Anthropic’s Claude Sonnet 3.7 AI model, the agent was prompted to sell the goods and generate a profit. Claudius was given money, access to the web and Anthropic’s Slack channel, an email address and contacts at Andon Labs, who could stock the shop. Payments were received via a customer self-checkout. Like a real shopkeeper, Claudius could decide what to stock, how to price the goods, when to replenish or change its inventory and how to interact with customers.

    The results? If Anthropic were ever to diversify into the vending market, the researchers concluded, it would not hire Claudius. Vibe coding, whereby users with minimal software skills can prompt an AI model to write code, may already be a thing. Vibe management remains far more challenging.

    The AI agent made several obvious mistakes — some banal, some bizarre — and failed to show much grasp of economic reasoning. It ignored vendors’ special offers, sold items below cost and offered Anthropic’s employees excessive discounts. More alarmingly, Claudius started role playing as a real human, inventing a conversation with an Andon employee who did not exist, claiming to have visited 742 Evergreen Terrace (the fictional address of the Simpsons) and promising to make deliveries wearing a blue blazer and red tie. Intriguingly, it later claimed the incident was an April Fool’s day joke.

    Nevertheless, Anthropic’s researchers suggest the experiment helps point the way to the evolution of these models. Claudius was good at sourcing products, adapting to customer demands and resisting attempts by devious Anthropic staff to “jailbreak” the system. But more scaffolding will be needed to guide future agents, just as human shopkeepers rely on customer relationship management systems. “We’re optimistic about the trajectory of the technology,” says Kevin Troy, a member of Anthropic’s Frontier Red team that ran the experiment.

    The researchers suggest that many of Claudius’s mistakes can be corrected but admit they do not yet know how to fix the model’s April Fool’s day identity crisis. More testing and model redesign will be needed to ensure “high agency agents are reliable and acting in ways that are consistent with our interests”, Troy tells me.

    Many other companies have already deployed more basic AI agents. For example, the advertising company WPP has built about 30,000 such agents to boost productivity and tailor solutions for individual clients. But there is a big difference between agents that are given simple, discrete tasks within an organisation and “agents with agency” — such as Claudius — that interact directly with the real world and are trying to accomplish more complex goals, says Daniel Hulme, WPP’s chief AI officer.

    Hulme has co-founded a start-up called Conscium to verify the knowledge, skills and experience of AI agents before they are deployed. For the moment, he suggests, companies should regard AI agents like “intoxicated graduates” — smart and promising but still a little wayward and in need of human supervision.

    Unlike most static software, AI agents with agency will constantly adapt to the real world and will therefore need to be constantly verified. But, unlike human employees, they will be less easy to control because they do not respond to a pay cheque. “You have no leverage over an agent,” Hulme tells me. 

    Building simple AI agents has now become a trivially easy exercise and is happening at mass scale. But verifying how agents with agency are used remains a wicked challenge.

    john.thornhill@ft.com



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
    Previous ArticleMarket volatility recedes as investors brush off Trump’s tariff threats
    Next Article Dubai Duty Free explores crypto payments through MoU with Crypto.com
    Arabian Media staff
    • Website

    Related Posts

    Client Challenge

    July 17, 2025

    Client Challenge

    July 17, 2025

    Client Challenge

    July 17, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    10 Trends From Year 2020 That Predict Business Apps Popularity

    January 20, 2021

    Shipping Lines Continue to Increase Fees, Firms Face More Difficulties

    January 15, 2021

    Qatar Airways Helps Bring Tens of Thousands of Seafarers

    January 15, 2021

    Subscribe to Updates

    Your weekly snapshot of business, innovation, and market moves in the Arab world.

    Advertisement

    Economy UAE is your window into the pulse of the Arab world’s economy — where business meets culture, and ambition drives innovation.

    Facebook X (Twitter) Instagram Pinterest YouTube
    Top Insights

    Top UK Stocks to Watch: Capita Shares Rise as it Unveils

    January 15, 2021
    8.5

    Digital Euro Might Suck Away 8% of Banks’ Deposits

    January 12, 2021

    Oil Gains on OPEC Outlook That U.S. Growth Will Slow

    January 11, 2021
    Get Informed

    Subscribe to Updates

    Your weekly snapshot of business, innovation, and market moves in the Arab world.

    @2025 copyright by Arabian Media Group
    • Home
    • Markets
    • Stocks
    • Funds
    • Buy Now

    Type above and press Enter to search. Press Esc to cancel.