#21: We have to talk about GPT-4o

May 17, 2024

Did you see the movie Her? Yes we did too… and it sure looks like the people over at OpenAI definitely took the concept of the film and straight up created an AI tool that does exactly this. In this week’s substack we’ll take a deep dive into the impactful new voice functionality and GPT version (GPT-4omni) that OpenAI announced this past Monday. Let’s go!

So what makes GPT-4o different then GPT-4?

It’s basically the same, but the main difference is the processing speed and the multi-functionality. GPT-4o can reason across audio, vision, and text in real time. Although GPT-4 can do those things, it can’t do it in real time (it will have to ‘think’ about for a second) and certainly it can’t do all those thing simultaneously. And it will be free for users to use (although there is a token-limit that will be much larger for paying users).

As OpenAI describes it on their website:

GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

It’s lightning fast. Let’s take a look at some mind-blowing examples from OpenAI.

In the example below it shows that while talking to the AI, it simultaneously can interpret the video you show it with your camera feed and respond accordingly.

It is also possible to let two AI’s interact with each other through voice. What makes this special is the fact it can differentiate between multiple voices and therefore know what instructions to use for example.

Humanlike interaction

The thing that makes this voice ai stand out between competitors like Siri or Google Assistant is the way it talks. OpenAI really went the extra mile to make it sound human-like. It even sighs and takes breaths. Think about how mind-blowing it is that a computer takes breaths! In the sample below, it shows enthusiasm when seeing a dog..

Move over Siri

Multiple sources over the past week stated that OpenAI and Apple are close to an agreement to put the GPT LLM in their products. If this happens, it will most likely be not too long before it will replace Siri and its functionalities. Read more about the agreements here.

Humane, the creator of the AI pin that didn’t get great reviews, announced that they will incorporate the new GPT in their product. Hopefully drastically boosting the speed and quality of the product. More info here.

Other interesting developments around OpenAI this week

OpenAI has signed a deal for access to real-time content from Reddit’s data API, which means it can surface discussions from the site within ChatGPT and other new products.

The deal will also “enable Reddit to bring new AI-powered features to Redditors and mods” and use OpenAI’s large language models to build applications. OpenAI has also signed up to become an advertising partner on Reddit.

After a few months of uncertainty, Ilya Sutskever (the big brain behind the succes of openAI’s GPT models) announced to be leaving the company.

#21: We have to talk about GPT-4o

Discussion about this post