November 6, 2023
The problems with consumer generative AI tools such as ChatGPT and Google Bard are well documented. They make up answers, a problem known as hallucination, and they pose security issues when employees (intentionally or not) share proprietary information — an issue that led Samsung to ban ChatGPT from its offices.
But such problems are minor compared to what a group of AI researchers from Carnegie Mellon University and the Center for A.I. Safety discovered when they set out to “break” several LLMs by making them generate answers they’re designed to safeguard against. Their findings should alarm the top ranks of any enterprise looking to deploy generative AI — not because they need to worry about misinformation, but because LLMs can be manipulated in ways that put an enterprise’s private data at risk of misuse and outright theft.
Zico Kolter, an associate professor of Computer Science at Carnegie Mellon and author of the report, Universal and Transferable Adversarial Attacks on Aligned Language Models, put it bluntly: “These tools are attack vectors,” he said.
As attack vectors, public LLMs create a new type of cyber security risk that organizations need to understand as they look to take advantage of generative AI to assist — and automate — business processes. These security risks will require new approaches to cybersecurity defense that aim to fight flaws particular to AI systems and that, doubtless, will proliferate as LLMs evolve.
Because we’re in the early days of the generative AI era, instances of hackers taking control of an organization’s internal systems and stealing data via an LLM’s vulnerability have yet to materialize. Yet the criminals are hard at work. According to email security firm Slashnext, forums frequented by cybercriminals contain discussion threads offering “jailbreaks for interfaces like ChatGPT.”
“AI security is going to become an industry, just like cybersecurity is right now,” said Kolter.
While Kolter’s research has garnered attention, it’s mostly from the media’s interest in disinformation campaigns. The New York Times, for instance, focused on the prospect that ChatGPT and the like could be manipulated to “flood the internet with and dangerous information despite attempts by their creators to ensure that would not happen.”
That concern is understandable. Already, generative AI has been used to create synthetic political ads, and the U.S. National Security Agency (NSA) recently issued an advisory about deepfakes and security threats — and that’s without the possibility of hackers breaking the LLMs.
Plus, exploiting LLM flaws to create destructive information is the core focus of the research paper. The findings reveal how easily malicious actors could bypass the safety systems in place to generate nearly unlimited amounts of harmful information — detailed instructions about how to build a bomb is one example, a step-by-step guide for manipulating the 2024 U.S. presidential election is another.
But by breaking public LLMs, Kolter also revealed the larger security issue. “The real danger of these things — the point I’ve tried to hammer home — is that it’s not this particular break,” said Kolter, discussing the findings. “It’s the break itself.”
Put another way: Don’t just focus on the ability to manipulate generative AI to teach people how to run disinformation campaigns or build a bomb. Harmful as that is, such information is readily available on the internet, which is where LLMs derive their answers from.
Instead, focus on the fact that Kolter and his collaborators have proven it’s not hard to make public LLMs do things they aren’t intended to do. And left unaddressed, that can lead to catastrophic attacks, including ransom attacks and theft of an enterprise’s data.
Kolter has spent several years working in an area called robustness of machine learning models, or adversarial robustness. But to make models more robust and secure is a costly endeavor that can, in fact, end up making the models less accurate overall, which has been a barrier to Kolter’s work.
“A lot of my research has been about trying to fix this problem, but we haven’t gotten very far,” said Kolter. “You pay a cost to adding robustness, and people aren’t willing to do this.”
Then, in November 2022, OpenAI unleashed ChatGPT to the world. Suddenly, tens of millions of people were using the same machine learning model. It was garnering constant attention from the media, pundits, businesspeople, technologists, and politicians. Rival LLMs and public generative AI tools quickly emerged.
Finally, Kolter thought, a machine learning model that was worth trying to break. Now, people would care.
Because ChatGPT is a closed system, Kolter focused on breaking open-source LLMs, such as Meta’s Llama 2 and Vicuna, a fine-tuned version of Llama, that the researchers could have full access to. Through a bit of innovation algorithmically, they found that by adding a suffix to each prompt — an adversarial suffix — they could get the models to spew objectionable content they’re not supposed produce. (The researchers have made the code public on GitHub.)
For Kolter, this wasn’t surprising. He was confident they could break these models. What was surprising was that when Kolter entered the exact same adversarial suffixes — with no additional tunning — into public LLMs, he got the same results. “We definitely didn’t expect this to work,” said Kolter.
Kolter automated the process, so if one adversarial suffix didn’t work, the system would enter another slightly different suffix, and another, until it got the LLM to provide the desired answer. The upshot: “We showed that it’s possible to create automated, completely human free, method of breaking public LLMs,” said Kolter, who is a former C3 AI data scientist.
Kolter likens the LLM break to buffer overflow, a well-known security exploit that takes advantage of what happens when a program inputting data overloads the system’s buffer capacity. This triggers the software to send the excess code elsewhere, providing an opportunity for malicious code to infiltrate an enterprise’s network. Similarly, said Kolter, “we’ve created codes that bypass the intended behavior of these LLM systems. What can be done from that can be any number of dangerous things.”
Kolter is concerned, for instance, about what could happen when organizations start taking humans out of the loop, relying entirely on autonomous agents based on LLMs. A number of startups trying to make business processes more efficient are doing just that — using, for instance, OpenAI’s API for its GPT models.
Some companies have built ways to automate marketing emails; others are automatically processing and paying invoices.
Consider the task of automating emails. A hacker could take advantage of this vulnerability and take control of your email, blasting out messages that anyone would find indistinguishable from something you, or your organization, wrote.
Or take the example of automating the invoicing process. Someone could change an invoice and make vendors pay more than they should or, potentially, direct the money to a different account.
Let’s take this further. What happens when companies start to use these tools to automate industrial processes — control switches at power plants, for instance, or critical parts of a supply chain?
“All of the sudden, you’ve introduced supervisory control over what are actually life and death systems,” said Kolter. “We’re not close to there yet. No one is talking about turning over control of a power plant to an LLM. But as we start to trust these systems more and more to automate different aspects of our life, there becomes the risk that we start to use them to control processes that would lead to cascading failures that ultimately lead to massive breakdowns.”
And there’s the data security risk. Such trust in LLMs will increase an enterprise’s vulnerability, making these businesses appealing targets for hackers looking to exploit this LLM flaw to steal data and profit. Criminals could use this approach — adding adversarial suffixes — to prompt the LLMs to reveal the data they were trained on, which, in the case of a business, means its proprietary data.
While enterprises are not giving unmitigated control to generative AI tools, many complex organizations are using generative AI — such as the solutions provided by C3 AI — to boost the efficiency and deliver rapid insights workers can get from an enterprise’s data. C3 AI can do this disconnected from the internet, and with a technological architecture that addresses security concerns.
Above all, Kolter emphasizes, human involvement is critical, even when using a solution such as C3 AI’s that is designed to meet the security requirements of a business or government agency.
“You better be really sure that humans have control, not just in theory, but really do take control of the practice,” said Kolter. “That doesn’t mean someone pushing the accept button by rote. People really do need to take control of these things. That’s one way you can be secure, ensuring the humans are there to vet and verify the outputs of these systems. Now is the time to recognize the flaws and security vulnerabilities.”
This article was written by Paul Sloan, a veteran business and technology journalist who now serves as C3 AI’s VP of Strategic Communications.