OpenAI logo is seen near computer motherboard in this illustration taken Jan 8, 2024. [Photo/Agencies]
Compared with its previous versions that support only text input by users, the latest ChatGPT-4o, released on Tuesday, allows users to input audio, image and even documents, to all of which ChatGPT-4o will respond promptly.
This upgrade marks a giant step forward for OpenAI. Although domestic AI models such as Baidu Yiyan allow users to chat with them using images, audio and documents, OpenAI, the once-leader of the industry, had fallen behind on that front until ChatGPT-4o was released.
That it realized the gap and took timely steps to catch up reaffirms a “human-level response”, a term OpenAI CEO and cofounder Sam Altman mentioned, is the future of AI technology. AI is often seen as an information processing assistant that “uses human forms of interaction to communicate”. The addition of real-time voice interaction functionality undoubtedly makes the user experience of large models more aligned with people’s expectations from an “AI assistant”.
For example, in the past, those banking on AI to solve a math equation could only input the equation manually in a chatting window. Now, with ChatGPT-4o, users can work on an equation on their computers and ask ChatGPT-4o to solve what appears on the screen. In other scenarios, they can even direct their smartphone camera toward a book or notebook and ask ChatGPT to act on the content that is seen. In fact, according to the OpenAI product release, ChatGPT can even see the user’s expression through the camera and comfort them if they seem tense.
These new features such as “screen viewing” and “emotion feeling” suit human daily communication habits. The users will feel like AI is serving them, not in the past that they were serving AI.
ChatGPT-4o offers free services to users. OpenAI Chief Technology Officer Mira Murati and CEO Altman stressed that “free-to-use” strategy is the future of their company. But it should be noted that free services benefit not only users but also the service provider, as they enable the latter to expand their businesses, collect more data from free users so as to train their Large Language Model more professionally, and then make new high-end products for paid users. That’s a benign cycle, as all sides in the issue benefit from it, making the free mode sustainable.