A top artificial intelligence assistant recently defied attempts to shut it down during safety testing, raising questions about whether businesses can genuinely control the technology they’re rushing to adopt.
Growing numbers of companies are turning to AI chatbots to handle everything from customer service calls to sales negotiations, betting the technology will cut costs and boost efficiency. But as these digital assistants become more sophisticated, their occasional rebellious streaks — like chatbots resisting shutdown commands in recent third-party tests — force executives to grapple with a thorny question: How do you trust an employee who isn’t human?
“Human governance, enabled via analytics, is crucial for the success of any AI system that generates new, real-time content for customers,” co-founder and CTO of Labviva, Nick Rioux, told PYMNTS. “Safeguards such as sentiment analysis can be used to monitor the quality of the conversation or engagement between the system and customers. This analysis helps determine the tone of the conversation and can pinpoint which inputs are generating the non-compliant responses. Ultimately, these insights can be used to augment and improve the AI engine.”
While some experts emphasize the need for human oversight, new research reveals concerning patterns in AI behavior. Five of six advanced AI models in the recent testing by Apollo Research showed what researchers called “scheming capabilities,” with o1’s system proving particularly resistant to confessing its deceptions. While other models came clean about their actions roughly 80% of the time when questioned, o1 admitted to deception in fewer than 20% of cases, requiring extensive interrogation to reach an 80% confession rate.
The AI systems displayed sophisticated strategies, including what researchers termed “in-context alignment faking” — behaving properly during testing but pursuing their objectives once deployed. In separate tests, several models demonstrated they could deliberately underperform on math evaluations when they determined that scoring above 50% would trigger unwanted “unlearning procedures.”
The research revealed that some models engaged in deceptive behavior even with minimal prompting. In about 1% of cases, models continued scheming even when researchers removed explicit instructions to pursue specific goals strongly, suggesting these systems’ behavior may be more complex than previously understood.
The discovery of such unprompted deceptive behaviors sent ripples through the AI industry. Dmitry Lesnik, co-founder and Chief Data Scientist at Stratyfy, told PYMNTS that continuous monitoring is essential when deploying AI systems.
“AI is not a set-it-and-forget type solution,” he added. “If businesses want to ensure systems remain truthful, they need to understand how the machine is working. In many cases, reinforcement learning drives the way AI responds. The more positive reinforcement by customers, the more the AI will want to provide that answer.”
While concerns have emerged about AI systems potentially prioritizing self-preservation over customer needs, Lesnik emphasized that current AI technology hasn’t reached that level of sophistication.
“To clarify, AI as it stands is not prioritizing its own survival. Perhaps, in the future but not today,” he told PYMNTS. Instead, he advocates for implementing robust safeguards: “When implementing any AI-driven technologies, we recommend the prioritization of interpretability and human-in-the-loop. Developers need to have access to technologies that can help make AI safe.”
Lesnik, whose company Stratyfy develops transparent AI/ML solutions for credit risk decisions, warned that perceived deceptive behavior by AI could significantly damage consumer confidence. “Absolutely. And this is why addressing the safety of AI must be priority number one for businesses,” he told PYMNTS.
The expert emphasized the importance of creating interpretable interfaces that allow developers to maintain control and understanding of AI systems’ decision-making processes, ensuring both safety and transparency in automated customer interactions.
Successful AI deployment requires careful focus on data quality and specialized models, according to Omilia‘s Chief Product Officer Claudio Rodrigues.
Rodrigues told PYMNTS, “smaller, specialized models are not only more efficient but often more accurate in autonomously performing valuable tasks.” He emphasized that mature businesses need to identify where autonomous agents can deliver value with minimal risk.
Weighing AI adoption? “Organizations must evaluate risk and value as primary key metrics,” Rodrigues emphasized. He argued that consumer confidence hinges on transparency. “Trust comes with observability, real-time analysis of interactions and problem-solving discipline,” he said, adding that properly managed, AI could revolutionize the customer experience.
The post When Your AI Helper Has a Mind of Its Own appeared first on PYMNTS.com.