How does ChatGPT algorithm work?

ChatGPT (Chat Generative Pretrained Transformer) was developed in November 2022 by the US company OpenAI. It is a language model that allows its users to communicate with a bot in an online chat in real time. The bot is able to have a conversation in multiple languages, answer questions, convey information on many topics, or share ideas.
In addition to these diverse capabilities, ChatGPT excels at remembering conversations so that it can consider previous replies and the user can communicate corrections to it. So it is a smart and innovative tool that facilitates communication and access to knowledge!

 

But how does ChatGPT work?
ChatGPT is an NLP (Natural Language Processing) algorithm that understands natural language and generates it independently. More precisely, it's a general public version of GPT3, a text generation algorithm specialized in writing articles and analyzing emotions. So ChatGPT works like GPT3, thanks to a model pre-trained on a massive corpus of 500 billion pieces of text data. It uses two different types of learning: supervised learning and reinforcement learning.In the supervised learning phase, it receives conversations in which both roles (bot and user) are played, so that the data is labeled (questions and associated expected answers). During the reinforcement learning phase, the previous interactions are used to classify the responses. This ranking is done by human trainers (Reinforcement Learning from Human Feedback) and allows the creation of a reward model based on this ranking.


 

In this way, the algorithm continues to train itself during the interaction with the users in addition to the pre-training. This allows him to remember the context and remember the messages of a conversation.

Reinforcement Learning from Human Feedback in Detail

As mentioned earlier, the reinforcement learning phase is more specifically a reinforcement learning from human feedback (RLHF) phase that works with real human trainers. This phase is divided into two steps, which we will explain in more detail:
After the supervised learning phase is performed on labeled data and a supervised font is learned, a Supervised Fine Tuning (SFT) model is generated. The human trainers then vote on the relevance of the model outputs and create a comparison dataset on which to train an RM (Reward Model).

The RM Reward Model l is optimized using the PPO Reinforcement Learning algorithm. The PPO algorithm is an "on-policy" algorithm that learns and updates a current policy by relying directly on the actions and rewards received. This generates a new model, the so-called policy model.


Ask ChatGPT for information about Data Scientists!

Now that we understand the main models and algorithms on which ChatGPT is based, let's test its performance together.
To do this, we join the chat at the following address: https://chat.openai.com/auth/loginThen we ask the bot to describe the role of a data scientist. To get an optimized answer, we use a precise prompt, ie a formulation that starts the conversation clearly.

ChatGPT is very capable of educating us about the data scientist profession and continuing the conversation we started. That's just a tiny glimpse of the capabilities of this tool, which is not only a source of information but can also write a text, summarize another, or suggest topic-related content. Its development could therefore compete with the copywriters!
What does ChatGPT look like on the developer side?

ChatGPT also has capabilities typically typical of computer developers. It can generate code in different programming languages ​​(Python, Java, C++...) and develop an algorithm to solve a problem. To achieve such a result, one just has to tell it clearly what the code to be generated should return. It also makes its mark in the field of debugging, being able to identify the source of a computer error and fix it like any other debugger software.
ChatGPT is also very useful for data engineers because it can simulate a virtual machine (VM) with a Linux terminal.

Finally, ChatGPT can also detect vulnerabilities in a program.
So ChatGPT is a model for NLP that works both from an editorial point of view and from a computer science point of view, in many areas!

ChatGPT - What are its limits?
ChatGPT responds to our question: "I'm a language processing model trained by OpenAI. My knowledge is limited to the shutdown date of my training data which is 2021. I can't surf the internet to check information or access data that isn't part of my memory. I do my best to answer questions accurately and completely, but my answer may not always be accurate or current."

Since its launch, the main criticisms of ChatGPT have been related to its time limit, as its knowledge stops at events before 2021, and wrong answers, which can lead to wrong information being shared, even if the error rate is minimal.
As far as the code goes, ChatGPT also has its limitations as the generated code can contain many bugs above a certain level of difficulty. The tool is limited to classic and repetitive programs, but can e.g. B. not perform computer analysis. After all, its cybersecurity skills are too easily accessible and many fear that they could be misused by hackers for malicious purposes.

From an ethical point of view, the tool faces other problems. Due to numerous cases of plagiarism, its use has been banned and its access banned from the computer stations of some American schools.
Finally, like any statistical model, ChatGPT has emotional limitationsUnlike human intelligence, it has no thoughts, no intuition, no morals, and no emotions either, which can pose a certain danger.

Like any innovation, ChatGPT has its limitations. It nevertheless remains an artificial intelligence tool with great potential, and its performance only gets better over time!

Comments

Popular posts from this blog

advantages and disadvantages of ChatGPT

Dangers of Artificial Intelligence