OpenAI Reveals GPT-4o Featuring New Voice, Text and Visual Capabilities

OpenAI, creator of the popular ChatGPT artificial intelligence (AI) platform, has revealed its latest iteration of the technology, GPT-4o, which boasts a variety of new ways for humans to interact with the technology including by voice, text and visually.

GPT-4o (GPT-4 omni), is a multi-lingual, multimodal generative pre-trained transformer that was revealed by the company’s chief technology officer, Mira Murati, during a livestream demonstration on May 13 and was released, in part, on the same day.

According to Murati, GPT-4o is two times faster than its predecessor, 50 percent cheaper to operate, and boasts rate limits five times higher than its predecessor, GPT-4 Turbo.

What makes GPT-4o stand out from other AI platforms is its ability to accept and generate any combination of text, audio, and image requests, and provide real-time voice responses, according to OpenAI.

Conversing with the platform is now similar to speaking with a human as the technology boasts what OpenAI calls “emotive voices,” a computer-generated voice that can imitate emotions and add human-like inflections to its output.

Success

You are now signed up for our newsletter

Success

Check your email to complete sign up

The platform is also more accessible than ever before, offering services in 50 different languages.

What can it do?

GPT-4o seems to have solved a problem that has been dogging AI image generators for some time; producing readable text in images.

OpenAI says that GPT-4o can now understand text descriptions and produce legible text on the images it creates, a task every other AI image generator available still struggles with.

In addition, the platform can now also act as a translator in real time.

Part of the live stream included a conversation between someone speaking Spanish and someone speaking English with the platform seamlessly translating the conversation.

The AI can now also utilize a device’s camera to “see” and describe the immediate environment around it, an indispensable tool for the visually impaired.

In one demonstration, the platform was able to see that a birthday was being celebrated after noticing a cake and candle in the room. In another scenario, it recognized someone playfully throwing up “bunny ears” behind one of the presenters.

@BeMyEyes with GPT-4o pic.twitter.com/nWb6sEWZlo
— OpenAI (@OpenAI) May 13, 2024

Sal Khan, the founder of Khan Academy, was in attendance and demonstrated how GPT-4o can act as a tutor. The AI was able to see a math problem displayed on an adjacent tablet and gently nudge the student along, solving the problem.

The AI can now also attend virtual meetings and participate and it was demonstrated how it can be used to prepare for a job interview.

It was also demonstrated how the AI can parse computer code and explain in plain language what it does and how it works.

Not everyone was impressed by the AI’s new abilities. Elon Musk, who operates his own competing AI company, xAI, said the reveal “made me cringe” and made several other critical comments about the new platform.

Though Musk co-founded OpenAI in 2015 as a non-profit research organization focused on developing AI safely and ethically, he has had a history of strife with the group.

In 2018, he chose to walk away from the organization, citing potential conflicts of interest with his work with Tesla.

When OpenAI transitioned into a for-profit company in 2019, Musk openly criticized the move, saying it contradicted the organization’s original mission.

Most recently, Musk has filed a lawsuit against OpenAI alleging it is in breach of contract for partnering with Microsoft and for keeping the code for its AI products a secret.

OpenAI says it will be rolling out many of the new features for free and paying users over the coming weeks.