According to Odaily Planet Daily, OpenAI has released its latest flagship model GPT-4o, which can reason about audio, vision and text in real time. The main concept is a humanized, supernatural, ultra-low latency personal voice interaction assistant. The 'o' in GPT-4o stands for Omni, which is a step towards more natural human-computer interaction. It accepts any combination of text, audio and images as input, and supports generating any combination of text, audio and image output.

It can respond to audio input in 232 milliseconds, with an average of 320 milliseconds, similar to human reaction times in conversation. In English and code, it has comparable performance to GPT-4 Turbo, with significant improvements on non-English language text, while the API is faster and 50% cheaper. GPT-4o performs particularly well in visual and audio understanding compared to existing models. Text and image input will be available in the API and ChatGPT today, with voice and video input following in the coming weeks.