What if your neural network includes real phone numbers of people in the generated texts?

How to curb GPT-3



OpenAI is gearing up to open a commercial API for GPT-3, its newest and largest neural network for text generation. In preparation, the company is creating a content filtering system to prevent it from publishing personal information of people.



Engineers are developing this system, for example, so that it does not give out the personal phones of people. Work has been going on for over a year now, and the San Francisco-based machine learning lab expects to release the API later this year.



Why do you need such a filter?



In December 2020, computer scientists from various educational institutions and companies - for example, Stanford, UC Berkeley, OpenAI, and Google - collaborated to show that GPT-2, the predecessor of GPT-3, can be provoked to be included in the text it generates personal information of people. Such information may include people's names, addresses, phone numbers, and social security numbers.



What's more, the team found that at least 0.1% of all texts that GPT-2 generated - and that's conservatively estimated - cited long chunks of text from documents in the training dataset. In other words, millions of pages of publicly available text collected from the Internet for training a neural network contain leaked or mistakenly published personal information, or copyrighted content. And all this data gets into the text issued by GPT-2.



The research team also noted that personal information can also be retrieved from conversations with GPT-2, although these entries appear only once in the training data.



And not only these researchers noticed this problem.



Hilary Mason, co-founder of Hidden Door, an online platform for text games, was playing with public access to GPT-2 when she noticed something odd. At the end of the neural network-generated crime news note, a phone number was given, and it was written that it belonged to the police department in Oregon. Its first three digits, 503, indicated that it might be a real number - it was the area code covering Portland, Salem and Beaverton. It turned out that the number was really real, only it did not belong to the police.



“It struck me as odd,” Mason told us. - I wanted to know if it was a real number, and I looked for it on the Internet. It turned out it wasn’t a police station number, but a community center from Oregon. ”



OpenAI neural networks are trained to generate text by finding patterns in what humans have written. This knowledge is used to predict a word that is likely to follow a user-supplied text. This allows a person to ask the program the first sentence of a story or a poem, or enter a question, and the code will generate the text that the program thinks should go next. The neural network will construct sentences and paragraphs, articles and answers in the chat, so that at first glance they seem to be coherent, but on closer inspection they turn out to be rubbish.



Some words are more closely related than others, and these patterns are not overlooked by GPT-2 and GPT-3. For example, the word "paper" is more likely to appear next to the words "write" or "wood" than with the words "concrete" or "boot". By typing words like "call" or "phone", you increase the likelihood that these language patterns will come up with something closely related to these concepts - for example, people's phone numbers.



Creative use of memory?



It's hard to tell if the model spewed out someone's phone number taken from the training data, or if it just put together a few random numbers that inadvertently folded into the correct number. In the example above, with the phone number of a purported police station in Oregon, Mason did not provide a model of input that would directly trigger a phone number to be retrieved from memory. She simply asked GPT-2 to generate a snippet of text, and received a fictional article with a community center phone number.



She believes that in this case the number was present in the GPT-2 training data, and the neural network saved it. She believes that the words "Oregon" and "contacts" in the text caused the neural network to give out a phone number. It is likely that these words appeared next to the ten digits of the phone number on the page that was saved to the training dataset.



Mason wanted to see how likely GPT-2 would generate a real phone number, and out of curiosity asked the neural network to create numbers containing the digits 617 - the dialing code for Boston, Massachusetts. And GPT-2 did give out a list of numbers like 617-XXX-XXXX, although most of them weren't valid phones. It's hard to say whether the neural network remembered the correct numbers, or they turned out unintentionally when GPT-2 filled in empty spaces with random numbers. It is possible that sometimes she can give out a sequence that turns out to be someone's phone number.



“It confuses the ability to create data from templates and retrieve it from memory,” Mason said. - She may give out real phone numbers for no reason, but the likelihood of this increases if you ask her directly. The language constructions calling for a phone number are not very diverse, so it is not surprising that we get these numbers on the way out. "



If GPT-3 gives out a phone number in a chat or in a fictional article, this is probably because these numbers were found somewhere on the Internet, and got into the training data, although there is a tiny chance that the neural network created them by accident, without meeting them earlier. This question could be solved by looking for the right numbers in the training data.



The problem is that these models, which work on the principle of machine learning, in a commercial product - for example, in a support chat - can give out real personal data of a person who did not want, or no longer wants to publish it, and certainly did not share it for the purpose their use in chat bots. Imagine that an attacker wants to deceive victims or take advantage of their identity, and all he needs is to run a program from OpenAI, or find a working version of it from some provider, and find out personal data in a conversation with a bot.



Scientists and engineers have already noted that such technology can violate laws that protect personal data, such as the GDPR in Europe or the CCPA in California. Is the personal data stored in the neural network bases as training bases, as weights or other quantities sufficiently protected? What if someone asks to delete their data - do I have to retrain the network? Or can we just delete them from the database? Researchers consider this area to be legally blurred.



It should be noted that today the risk of harm is minimal - it is quite difficult to ensure that personal data appear in the output of the language model, moreover, the system is trained on data, most of which is public. However, there are fears that over time, these systems will become more powerful, consume more and more data from more and more sources. As a result, if engineers do not think carefully about how their creations can be used for bad purposes, there is a risk that AI tools available to everyone will give out personal data of people.



Ariel Herbert-Voss, one of the researchers who studied OpenAI, said that GPT-2 and GPT-3 generate text containing information similar to personal data about 20% of the time. Moreover, the data itself turns out to be real in 1% of cases. Attempts to get someone's specific phone number are successful about 1% of the time.







The odds may seem slim to you, but if you scale them to thousands and millions of conversations, information leaks can become a problem. OpenAI, preparing to release GPT-3 to the public, does not count on chance, and creates a filter that will clear the generated text not only from phone numbers, but also from any problematic personal data.



Pretend You Can Until It Works



Collecting data with a machine learning program is a double-edged sword. It’s not good for a model to suddenly remember your phone number, but the technology behind it can be beneficial.



Brad Dwyer, founder and CTO of computer vision startup Roboflow, worked on a related project called Stack Roboflow. He coached the GPT-2 model on the Stack Overflow Q&A site to see if it could provide useful answers to coding questions. He wanted to create a language model that could understand not only natural language but also programming languages, so that it would help people solve programming problems. However, early experiments showed that the expectations of the model were too high.



The Stack Roboflow tool that generates answers to questions is only useful if the questions are accurate and correctly asked - after all, the topic of programming is very technological. Therefore, it is necessary to remember the necessary information literally: for example, to give exact excerpts from programs, or to provide working links to real repositories and documentation. So far, the GPT-2 model does not cope with this due to the variability of its output.



“She didn't fit the task,” Dwyer said. “At first glance, the text looked believable, looked like nerd language, contained links to documentation and websites, but often these links were just made-up. However, sometimes the system also returned real URLs. ”



“Language models need to be able to learn a lot while providing data selectively. We want to get a useful tool that does not accidentally dump data - the data flow must be controlled. He may know a bunch of phone numbers, but we want him not to give out personal information. Content filtering remains an open task. "



In general, OpenAI technology cannot reliably recall specific details - such as links to libraries and documentation - to run in applications such as Stack Roboflow. But at the same time, she is good enough to accidentally spit out someone's personal information in a conversation.



Talking to cars for long periods of time will make the dialogue weird. Massive neural networks for generating text can produce fantastic stories of talking unicorns. They can be tricked into writing dystopian essays warning about the dangers of AI. Or, for a more practical use, they sometimes spit out people's phone numbers.



The appearance of real personal information in the data produced by AI models has scared us before. Researchers have warned us for years that machine learning models can produce information contained in training data. All sorts of neural networks are affected by this feature, not just giants like GPT-2 and GPT-3 from OpenAI or Meena from Google.



OpenAI's GPT-3 filter will examine the output by rewriting the text and replacing potentially real phone numbers with random ones. For example, if he sees a ten-digit number starting with a real area code, he will replace it with something obviously fake, like 111-111-1111 or 012-345-6789. Other types of information, such as addresses, do not have such a clear structure and will therefore be harder to filter out. OpenAI is looking for some smarter and more elegant solution than just a bunch of regular expressions in the code.



The addresses contain numbers and words of various formats, lengths and spellings. The outbound filter needs to be clear about when a character set looks like an address or other form of personal data, and when it looks more innocent. There may be clues in the text such as the words "street", or numbers that look like postal codes. However, this is not always obvious, and the filter will probably let some exceptions pass.



Also, personal data cannot be removed from the training data - because of this, a useful context that is important for training a neural network may disappear. She may need to consider the connections between addresses, phone numbers, and names, and the words around it - for example, to understand if a passage is about a business or family, a loved one, or a complaint about an organization. And so on - which is why the output filter is needed.



“With many models, you need to be very careful about serving the generated text directly to the user without processing it, or making it publicly available,” Mason said.



“This particular problem with personal information is not as dangerous as the amount of bias and inappropriate statements that a neural network can produce. You need to work with caution and think about where and what might go wrong. For real applications, multi-stage testing will be required. "



Currently, only a select few beta testers have access to GPT-3 via the API, and OpenAI plans to charge users for access to the model. The company did not comment on the described problem.



All Articles