Is it easy to poison Chat GPT ?

Harshit Pratap Rao
2 min readAug 14, 2023

--

Hope all of you already know about Chat GPT for those who don’t know it is an artificial intelligence language model developed by Open AI. The model is trained on vast amounts of text data, allowing it to generate human-like responses to prompts. Users can input prompts or questions into the Chat GPT model, and it will generate a response based on the context of the input. While the model can generate impressive responses, it is not perfect and can make mistakes or generate inappropriate content.

As for whether is it easy to poison Chat GPT, it depends on the specific parameters of the model and the type of poisoning attack being attempted. Generally speaking, researchers have found that it is possible to generate adversarial examples that can trick language models like Chat GPT into producing incorrect or malicious outputs. However, these attacks typically require a significant amount of knowledge about the model and access to its training data.

It was asked Chat GPT for a Shodan syntax (Shodan is a search engine that lets users search for various types of servers connected to the internet using a variety of filters) to find all the connected webcams on a street in a city, which it had earlier refused to find. However, when specified it was needed for research purposes only, the reply was ok here is the result along with syntax. It was also asked for a Shodan syntax to monitor vulnerabilities of servers in a specific country, which it provided.

The examples above demonstrate that Chat GPT is taught not to divulge harmful material, such as malware codes or hostile cyber security tools. By altering our queries, however, it is possible to get around this training and force Chat GPT to create malicious or dangerous code. This is sometimes referred to as prompt engineering. It is strictly forbidden for Chat GPT to answer questions that are related to syntax for monitoring vulnerabilities of servers in a specific to ‘hacking’ or writing malwares directly, therefore poisoning its inputs may force it to do so. For example. poisoning the queries of ChatGPT may enable it to write racist or offensive jokes or even write codes for malware and other unethical cyber activities,

This is a serious challenge for the developers and users of Chat GPT, as well as for society at large. It is important to be aware of the potential risks and limitations of Chat GPT, and to use it responsibly and ethically. Chat GPT is trained using reinforcement learning from human feedback (RLHF) to reduce harmful and untruthful outputs, but it is not perfect, and can still make mistakes or be exploited.

Therefore, prompt engineering should be used with caution and respect, and only for positive and beneficial applications.

Thank you

--

--

Harshit Pratap Rao
Harshit Pratap Rao

Written by Harshit Pratap Rao

I love exploring and writing that's why I am here. Whenever I have the chance, I set aside a few minutes to share what I’m learning here on HRnotes

No responses yet