The recent spate of announcements by tech titans such as Microsoft, Google, Apple,
OpenAI, NVidea, et al, has started a serious buzz among technology gurus and
business leaders. This buzz is a continuation of the overarching headlines emanating
out of Davos 2024, the consensus there that AI and Generative AI (this was
specifically mentioned) as the means to, firstly, transform society and, secondly, to
achieve greater revenues. While computer science graduates are revelling in the
availability of new AI technologies, most of us are not sure what the buzz is about.
Sure, we are all using ChatGPT, but how is this going to transform our lives? This
article attempts to unpack the technologies associated with AI, especially that of
Generative AI that is at the heart of the buzz.
What is Generative AI?
To answer this, we need to go one step back and properly understand Artificial
Intelligence (AI). Broadly speaking AI can be equated to a discipline. Think of science
as a discipline; within science we get chemistry, physics, microbiology, etc; in the
same way AI is a broad discipline, and within AI there are several subsets such as ML
(Machine Learning), algorithms to perform specific tasks, Expert Systems (mimicking
human expertise in specific topics to support decision making), Generative AI, etc.
Generative AI (Gen AI) has been making significant strides, especially since
December 2022. On 30 November 2022, OpenAI released ChatGPT, which reached
100 million users in just 2 months, compared to 78 months for Google Translate, 20
months for Instagram, and 9 months for TikTok. Generative AI is a major
advancement, referring to AI that creates new content, such as text, images,
language translations, audio, music, and code. While currently focused on these
outputs, Gen AI’s potential is vast and could eventually encompass areas like urban
planning, therapies, virtual sermons, and esoteric sciences. Generative AI is
essentially a subset or specialized form of AI, akin to how chemistry is a subset of
science. In AI terminology, these systems are called “models,” with ChatGPT being
one example.
Unpacking GPT
The term “Chat” in ChatGPT signifies a conversation, whether through text or voice,
between the user and the system. “GPT” stands for Generative Pre-trained
Transformer. “Generative” refers to the AI’s ability to create original content, while
“Pre-trained” highlights a core concept in AI where models are trained on vast
datasets to perform specific tasks, like translation between languages. For instance,
a translation model can’t provide insights like a Ferrari’s speed, but it can explain
linguistic origins, such as Ferrari deriving from the Italian word for “blacksmith”. This
capability is honed through deep learning, where the model learns associations and context from extensive data. The training process involves predicting the next word
in a sequence based on prior words, which can sometimes lead to errors like
“hallucinations” – unexpected outputs such as “the pillow is a tasty rice dish”. This
demonstrates how AI learns and operates within defined parameters without human
intuition.
The key here is that the model has to be trained on, firstly, vast amounts of data,
and, secondly, with meticulous attention. And this leads us to another common
phrase or jargon used in the AI world – Large Language Models or LLMs. In fact, Chat
GPT is a Large Language Model! If we have to define LLM, it could be defined as a
next word prediction tool. From where do the developers of LLMs get data to carry
out the Pre-training? They download an entire corpus of data mainly from websites
such as Wikipedia, Quora, public social media, Github, Reddit, etc. it is moot to
mention here that it cost OpenAI $1b (yup, one billion USD) to create and train Chat
GPT – they were funded by Elon Musk, Microsoft, etc. Perhaps, that is why it not an
open-source model!!
Let’s now unpack the ‘T’ of ‘GPT’. This refers to Transformer. This is the ‘brain’ of
Gen AI; Transformers may be defined as machine learning models; it is a neural
network that contains 2 important components: an Encoder and a Decoder. Here’s a
simple question that could be posted to ChatGPT: “What is a ciabatta loaf?”. Upon
typing the question in ChatGPT, the question goes into the Transformer’s Encoder.
The 2 operative words in the question are ‘ciabatta’ and ‘loaf’. The word ‘Ciabatta’
has 2 possible contexts – footwear and Italian sour dough bread (Ciabatta means
slippers; since the bread is shaped like a slipper, it is called ‘ciabatta’).
In the context of “loaf,” ChatGPT, a Pre-Trained model, would prioritize food items
over other meanings. For instance, given “loaf,” it would likely choose “bread” over
“footwear,” recognizing “ciabatta bread” as a specific example. The model processes
words sequentially and can predict associations like identifying ciabatta as an Italian
sourdough bread. However, ChatGPT’s responses aren’t always flawless, as accuracy
depends on its training and fine-tuning. Despite occasional errors, its answers are
often remarkably precise, reflecting meticulous development involving techniques
like “attention,” which enhances its ability to focus on relevant details in data
processing.
Did you know that Gen AI has been in use well before the advent of ChatGPT? In
2006 Google Translate was the first Gen AI tool available to the public; If you fed in,
for example, “Directeur des Ventes” and asked Google Translate to translate the
French into English, it would return “Sales Manager”. (By the way, Transformers was
first used by Google). And then in 2011 we were mesmerised by SIRI which was such
a popular ‘toy’ initially among iPhone users. Amazon’s Alexa followed, together with
chatbots and virtual assistants that became a ubiquitous feature of our lives – these
are all GenAI models. As can be seen, we’ve been using Gen AI for a while, however
no one told us that these ‘things’ were Generative AI models!