The Real Risks of AI

The recent release of ChatGPT, Bard, DALL-E 2, and Stable Diffusion has caused a lot of excitement and a lot of fear. In this article, we are going to dive deep into the real risks of AI.

The fear takes a few forms. Some are afraid that this will lead to computers becoming sentient and taking over the world. Others are afraid that it will lead to the elimination of jobs. Still others are afraid that it will lead us all to stop thinking and to depend too much on these new tools, resulting in less creativity and less progress.

I think all of the above fears are exaggerated and that the benefits of AI will likely outweigh the risks in the long run.

But there is another kind of risk that I am very concerned about and that is the topic of this article.

The Risks of AI: Garbage In – Garbage Out

When computers first took hold in corporations around the world, when applications started performing calculations on data held in databases, and reports were generated from that data, a common refrain was heard in IT departments. And as I think of it, I have not heard this sentiment spoken very often recently. “Garbage in – Garbage Out”. This saying was shorthand for saying that the quality of the output of any computer program was only as good as the data entered into the database used by the program.

Initially, all data for a given program was entered by and specifically for the company that was using the application, and often by the specific department using the data. This meant that the company had significant control over the quality of the data and an understanding of the origins of the data. Many companies did a poor job of controlling the quality of the data and of the entry of the data, but they did have visibility into the source of the information and could determine the quality of it when needed.

AI Data Sources Have Changed

In the many years since then, much has changed. Now in addition to data generated by and entered by a company, its applications use a great deal of data that is gathered from outside sources. There are thousands of available data sources, both public and private, that can be purchased and used by internal applications, with very little control or visibility into the quality of that data. While the quality of internally generated data has tended to improve over time with better controls at the database and application level, there is far less visibility or control of the quality of the external data sources.

The inclusion of these outside data sources makes validating the results of the calculations, reports, and other outputs very critical. Today, there is still generally a human interpreting these reports and calculations, and making decisions based on them. This interpretation is the last line of defense against ‘Garbage Out’. In almost any business application, an experienced user can spot an incorrect result. They might not know what is wrong, but they can tell something is wrong, sending developers and database administrators back into the data to figure out where the problem is. This plays out every day in every organization.

AI Magnifies Garbage In – Garbage Out

Traditional applications present the data they ingest in different forms (reports, graphs) so that humans can make decisions based on those presentations.

AI applications take it a very significant step further. They don’t present data that helps the user make a decision, instead, they make decisions for the user. This is a very significant difference because the user no longer has visibility into the semi-processed data that could have clued them into a problem with the data. Some AI models will list the sources and logic they used to make the decision, but even that does not give visibility into the actual data used.

If a well-developed and tested AI model had perfect data, the results would be, well, perfect. AI models learn from the data that they ingest. But AI models, especially general use models like the ones mentioned above, use free, publicly available datasets. The quality of this data is suspect at best. And the organization of the data (in effect the underlying data models) can influence the inferences drawn by the models.

Other Risks of AI: A Real-World Example

I listen to a podcast called the All-in Podcast. This podcast features four well-known investors who talk about politics, investing, and other interesting topics. During an episode shortly after the release of ChatGPT, they asked the AI chat tool to give a profile of one of the hosts (David Sachs). The model created a very accurate-looking profile of Sachs, but in the footnotes, attributed a number of articles to him that he did not write. I suspect he had commented about those articles, and the model made inaccurate inferences about his involvement in the articles.

This is a perfect example of the risk of not having control or visibility into the data sets used by the AI models.