On the Morning of May 14, OpenAI Officially Released the New Flagship Model GPT-4o, Marking a Historic Moment in the Global Generative AI Field
OpenAI has officially released its new flagship model, GPT-4o, on its official website. This model is capable of real-time reasoning across video, audio, and text, marking another historic moment in the global generative AI field.
GPT-4o’s Voice Version of ChatGPT Assistant
Built through GPT-4o, the voice version of the ChatGPT assistant can communicate with users across audio and video. For instance, GPT-4o can perform real-time translations, sing songs, solve math problems, and tell jokes. To put it simply, the jobs of tutors, translators, and secretaries may no longer be safe in the future.
The Movie “Her” (2013)
The movie “Her,” which won the Oscar for Best Original Screenplay in 2013, presented a “romance” between a human and an AI. The omnipotence of the AI character, Her, showcased the powerful and terrifying side of AI.
Eleven Years Later: GPT-4o
Eleven years later, GPT-4o has turned this movie into reality, allowing everyone to have their own “Her.”
GPT-4o Feature Showcase
OpenAI has showcased many features of GPT-4o. Here, the “AIGC Open Community” selects a few representative ones. For example, you can video call GPT-4o through a smartphone and ask it to interpret your thoughts.
First, an OpenAI employee took a video of the surrounding scene with their phone. Soon after, GPT-4o provided a description of the environment. When the employee asked GPT-4o to guess what he would be doing that day, it speculated it might be related to OpenAI, such as hosting a press conference.
When the employee mentioned that the press conference was related to “you,” GPT-4o’s response was somewhat terrifying. It showed surprise and a pause, as if it were human, which is a technical feature that no previous voice assistant has possessed.
OpenAI President and Co-founder Greg Brockman
Greg Brockman, the president and co-founder of OpenAI, had two GPT-4o voice assistants converse and sing to each other.
A Father’s Request
A father hoped that GPT-4o could tutor his son through a difficult math problem. Unlike previous versions of ChatGPT, which would give out all the answers at once, GPT-4o guided the child step by step on how to solve the problem, much like a tutor.
When the “AIGC Open Community” saw this, they felt a sense of concern for those tutoring teachers, as it is likely that even their jobs may not be safe after a few more evolutions.
Introducing GPT-4o to Friends
After spending a long time with GPT-4o, it’s probably time to introduce it to some friends. OpenAI showcased GPT-4o’s social side by directly interpreting a pet dog.
Real-time Translation with GPT-4o
What if you want to converse with a French, Serbian, or Hungarian person? You would have to use translation software, but the traditional kind is too slow and not suitable for communication. With GPT-4o, real-time translation is possible. By the time you finish speaking a sentence, GPT-4o has already translated it for you and output it in voice.
Impressions of GPT-4o
How do you feel about GPT-4o after seeing these examples? Isn’t it almost the same as the character Her from the movie? What’s more exciting is that OpenAI has announced that GPT-4o is available for free use, even for non-paying users.
GPT-4o Test Data
GPT-4o is a multimodal model capable of inputting and outputting text, video, and audio in a single neural network.
According to the official evaluation provided by OpenAI, GPT-4o’s voice response can be as short as 232 milliseconds, with an average response time of 320 milliseconds. The English text and code capabilities of GPT-4o are comparable to those of GPT-4 Turbo.
GPT-4o set a new high score of 88.7% in the MMLU evaluation, surpassing other renowned large models currently on the market, such as Claude 3 Opus, Gemini Pro 1.5, and Gemini Ultra 1.0.
Audio ASR Performance of GPT-4o
The audio ASR performance of GPT-4o has significantly improved the speech recognition capabilities for all languages compared to Whisper-v3, especially for those very rare minor languages. Moreover, its audio translation capabilities have surpassed Google’s Gemini.
Text and Image Functionality of GPT-4o
Currently, the text and image functionalities of GPT-4o are available for use in ChatGPT, and free registered users can also experience this feature.
API Access for Developers
Developers can access GPT-4o’s text and visual functionalities through the API. Compared to GPT-4 Turbo, GPT-4o has doubled the speed and reduced the price by 50%, while also greatly lowering the token limit.
Upcoming Release
In the coming weeks, OpenAI will launch the alpha version of the new voice mode GPT-4o in ChatGPT Plus. Let’s look forward to the arrival of an even stronger “Her.”
For more, you can visit GPT-4o.