Processing and protecting data in the AI era
“Artificial intelligence is the future, and the future is here.” This is a quote from AI pioneer Dave Waters. Well, at least according to ChatGPT. However, it’s difficult to verify whether Waters actually did make this statement, because this is information ChatGPT has simply retrieved from its enormous data sets, the building blocks of all output produced by the entities known as Large Language Models (LLMs).
Now, it isn’t entirely outside of the realms of possibility that an interview with Waters containing this very statement can be found in these data sets. But at the same, it is also equally possible that this statement was simply evaluated by ChatGPT as being plausible – without Waters ever even having made it. The “facts” that AI draws from data sets can often sound plausible, but they aren’t always necessarily true. Sometimes they’re just flat-out lies. ChatGPT itself points out this dilemma on its website: “ChatGPT may produce inaccurate information about people, places, or facts.” With this in mind, the most important maxim when dealing with AI ought to be: Always question the credibility of any given claim. Large Language Models are algorithms. They evaluate the training data that is fed to them, recognize speech patterns and from there, deduce what the next word for a suitable answer might be.
Large Language Models: How do they work?
This mixed, unchecked content of data sets introduces another problem: AI can sometimes propagate biases found in its training data. How does that work? “AI bias” gives preference to certain mindsets or social stereotypes because they are statistically more prevalent in its data sets. These biases are then present, sometimes overtly, sometimes covertly, in AI output. In a bid to tackle this, the EU is currently drafting an AI law that will require AI providers to go the extra mile and, for example, screen data sets more closely. Companies and media organizations have also updated their codes of ethics to emphasize the importance of ensuring people are aware of such biases. And when it comes to multilingual environments, this problem is intensified by cultural specificities. The best way to make sure AI-generated translations aren’t full of biases and blunders is still to insert a human into the process. This means scrutinizing a text for potential stumbling blocks before handing it over to AI as well as doing a careful editing pass on the other end to ensure the product reflects the target audience’s context and expectations.
Human-in-the-loop (also known as “expert-in-the-loop”) is an effective solution here. A human-in-the-loop workflow might look a little bit like this:
- A task needs to be completed.
- Somebody with the appropriate expertise considers and decides whether AI can assist with said task.
- If AI can assist, then its contributions and output are thoroughly reviewed and revised by a human.
- If AI can assist further, more precise instructions and prompts may be given.
Effectively, the text is fine-tuned with the machine’s support until the text meets the technical and linguistic requirements. In this way, AI services such as ChatGPT from OpenAI, Luminous from Aleph Alpha or Bard from Google can be helpful tools. They can speed up certain steps in our workflows and may help save resources.
Ultimately, AI’s ability to evolve is what keeps it “alive.” And that means it keeps updating its input as well. It feeds on the data that users submit with each new request, for example. So, if an AI tool receives a prompt to generate a press release containing information about an upcoming business transaction, it then stores this data for subsequent use. This is one way ChatGPT could potentially compromise data protection, for example, if it for whatever reason reveals this information to other users. If you enter personal data into AI, it’s more than just information security that’s at risk – now, we’re heading into data protection issues. Think of it like commuting on public transit: You wouldn’t want to discuss business secrets or confidential information for fellow passengers to hear. And definitely don’t leave any information, such as the unpublished sales figures for the next annual report, at the train’s snack bar. After all, some fellow travelers may well have an interest seeing the information disclosed. Data confidentiality should be a top priority, especially on the internet.
ChatGPT vs. GDPR
Users can now choose whether or not they want to allow their chat history to be saved by ChatGPT. But this still leaves GDPR compliance unresolved. Sometimes data collected by AI – not only the prompts you use, but also technical information such as cookies or your device data – is stored on servers in the United States, i.e., a “third country” as defined by GDPR. When processing personal data, it is legally imperative to obtain prior consent from the parties concerned. On the whole, it’s generally best to exercise a considerable degree of caution with regards to sharing data online.
AI and Cybersecurity
AI systems are also prime targets for cyber criminals. In addition to the large, publicly available AI platforms that are used by millions of users worldwide, in-house AI tools or engines also require high levels of security. These immense amounts of data have to be protected. Criminals have found ways to benefit from AI, too. Phishing emails are becoming increasingly realistic, so it’s even harder to identify them as such. And by using AI to evaluate publicly viewable social media profiles, hackers can better tailor their attacks to specific target audiences.
Similarly, the number of deep fakes has increased in recent years. Cyber criminals can use digitally manipulated photos, videos or audio in a bid to blackmail companies and private individuals. If an explosive audio clip in which a CEO makes some inflammatory comments is circulated on the internet, it can quickly make big waves – even if the clip is a complete fake. Companies are therefore urged to have appropriate response strategies in place for the not-all-too-unlikely event that something like this occurs. The public should also adopt a healthy dose of skepticism when such recordings circulate.
Of course, AI is also a welcome tool for a company’s own cybersecurity: These algorithms are setting new standards for processing huge amounts of data in the shortest possible time. But the downside for IT security mirrors developments in other areas, such as communications: We don’t know how AI arrives at its results. Even AI experts aren’t 100% clear on what goes into the decision-making process. Is the data clean, sufficient and meaningful? Might a third party be influencing the AI tool? Have the appropriate models been applied? The key here is not to use AI blindly or as any kind of replacement for cybersecurity. It’s just one tool for improving corporate information security. Ultimately, the best intelligent security solutions are usually a combination of human and artificial intelligence.
Artificial intelligence can be useful in myriad ways to us humans. Though as we have seen, it may also have some less than favorable implications in areas such as cybersecurity and ethics, to name just two. When dealing with this technology, it’s important to follow a few ground rules:
- Be sure to perform a credibility check and a cross-check of any information obtained by AI
- Question not only concrete data, but also underlying biases and stereotypes
- Be sure to never feed an AI confidential information
- Be aware that AI is able imitate voices, images and more, and consider all the risks this implies
In some arenas, AI is already firmly rooted in everyday work. In others, its importance is only beginning to grow. In light of recent rapid developments, awareness of the legal requirements for data protection and data security, as well as other areas, is essential.
Above all, humans should remain the controlling authority of AI. Yes, it is an enormously helpful tool that will have a massive impact on our daily tasks. It can help us to operate at a faster pace and inspire us with new ideas. And it can be a brilliant sparring partner for generating creative content. However, we should all proceed with caution, especially when it comes to confidential data. Because in the vast majority of cases it is unclear what can happen to this data – and who might ultimately gain access to it.
Sources and further information:
SoSafe Ltd: The security risks of ChatGPT – and how to avoid them. The new risks ChatGPT poses to cyber security (sosafe-awareness.com)
PR-Werkstatt: „KI-Leitfaden für PR-Profis“ (PR Report 3/2023)
European Parliament: “EU AI Act: first regulation on artificial intelligence” (June 14, 2023). https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence
OpenAI: “New ways to manage your data in ChatGPT”. https://openai.com/blog/new-ways-to-manage-your-data-in-chatgpt
isits AG: „Künstliche Intelligenz und IT-Sicherheit: Chance oder Bedrohung?“. https://www.is-its.org/it-security-blog/kuenstliche-intelligenz-und-it-sicherheit-chance-oder-bedrohung