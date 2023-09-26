OpenAI, a leading artificial intelligence research lab responsible for creating ChatGPT, is set to introduce voice and image capabilities to its popular language model. This significant ChatGPT update will allow users to engage in voice conversations and show images to the AI model for interaction, marking a significant step forward in the evolution of AI communication.

The introduction of voice and image capabilities in ChatGPT is designed to provide a more intuitive interface and expand the ways the AI can be used in daily life. For instance, users could discuss landmarks, plan meals, or seek assistance with homework using these new features. The rollout will initially be available to ChatGPT Plus and ChatGPT Enterprise users over the next two weeks, with voice available on iOS and Android and images on all platforms.

The voice feature is powered by a new text-to-speech model and OpenAI’s open-source speech recognition system, Whisper. This allows users to engage in back-and-forth conversations with the AI, creating a more interactive and engaging user experience. Users can select from five different voices, created in collaboration with professional voice actors, adding a layer of personalization to the AI interactions.

ChatGPT can now see, hear, and speak

The image feature, on the other hand, allows users to show one or more images to ChatGPT, which can then analyze and discuss the images. This image understanding is powered by multimodal GPT-3.5 and GPT-4, which apply language reasoning skills to a wide range of images. This feature could be particularly useful in scenarios where visual context is important, such as discussing a piece of artwork or identifying a landmark.

New ChatGPT update September 2023

OpenAI is deploying these features gradually as part of a strategy to refine risk mitigations and prepare for more powerful systems in the future. The new voice technology presents potential risks, such as impersonation or fraud, which is why it is being used specifically for voice chat. Vision-based models also present challenges, such as hallucinations or misinterpretations, which OpenAI has tested for risk prior to deployment.

In the development of the vision feature, OpenAI collaborated with Be My Eyes, a mobile app for blind and low-vision people. This collaboration helped OpenAI understand the uses and limitations of the vision feature, ensuring it is as useful and accessible as possible. Technical measures have also been taken to limit ChatGPT’s ability to analyze and make direct statements about people, in order to respect individuals’ privacy.

Be My Eyes

OpenAI is transparent about the model’s limitations, discouraging high-risk use cases without proper verification and advising non-English users against using ChatGPT for transcription. This transparency is crucial in ensuring that users understand the capabilities and limitations of the AI, and use it responsibly.

Following the initial rollout to Plus and Enterprise users, access to the new features will be expanded to other user groups, including developers. This phased rollout approach allows OpenAI to gather feedback and make necessary adjustments before making the features widely available.

The introduction of voice and image capabilities to ChatGPT by OpenAI represents a significant advancement in AI communication. While these features present new opportunities for user interaction, they also come with potential risks and challenges. OpenAI’s gradual deployment strategy, collaboration with Be My Eyes, and transparency about model limitations demonstrate a thoughtful approach to managing these risks while pushing the boundaries of what AI can do.



