Study shows AI agents struggle with CRM and confidentiality

Businesspeople building a robot.

Large Language Model (LLM) agents aren’t very good at key parts of CRM, according to a study led by Salesforce AI scientist Kung-Hsiang Huang.

The report showed AI agents had a roughly 58% success rate on single-step tasks that didn’t require follow-up actions or information. That dropped to 35% when a task required multiple steps. The agents were also notably bad at handling confidential information.

“Agents demonstrate low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance,” the report said.

Varying performance and multi-turn problems

While the agents struggled with many tasks, they excelled at “Workflow Execution,” with the best agents having an 83% success rate in single-turn tasks. The main reason agents struggled with multi-step tasks was their difficulty proactively acquiring necessary, underspecified information through clarification dialogues. 

Dig deeper: 7 tips for getting started with AI agents and automations

The more agents asked for clarification, the better the overall performance in complex multi-turn scenarios. That underlines the value of effective information gathering. It also means marketers must be aware of agents’ problems handling nuanced, evolving customer conversations that demand iterative information gathering or dynamic problem-solving.

Alarming lack of confidentiality awareness

One of the biggest takeaways for marketers: Most large language models have almost no built-in sense of what counts as confidential. They don’t naturally understand what’s sensitive or how it should be handled.

You can prompt them to avoid sharing or acting on private info — but that comes with tradeoffs. These prompts can make the model less effective at completing tasks, and the effect wears off in extended conversations. Basically, the more back-and-forth you have, the more likely the model will forget those original safety instructions.

Open-source models struggled the most with this, likely because they have a harder time following layered or complex instructions.

Dig deeper: Salesforce Agentforce: What you need to know

This is a serious red flag for marketers working with PII, confidential client information or proprietary company data. Without solid, tested safeguards in place, using LLMs for sensitive tasks could lead to privacy breaches, legal trouble, or brand damage.

The bottom line: LLM agents still aren’t ready for high-stakes, data-heavy work without better reasoning, stronger safety protocols, and smarter skills.

The complete study is available here.

Email:

The post Study shows AI agents struggle with CRM and confidentiality appeared first on MarTech.

Back To Top