#16 Meta is disrupting e-commerce content with AI
This week in our Substack we are zooming in on the latest disrupting AI in e-commerce by Meta, a new AI system that can detect emotions in your speech and the new voice engine by Open AI. Let’s go!
Better content for fashion E-commerce
Creating high-quality product imagery for fashion-related items has traditionally been an expensive and time-consuming process, giving large companies like Zalando a significant competitive advantage. However, this advantage may soon be diminished thanks to a new research paper called Garment3DGen, which simplifies the process to the point where models are no longer necessary. This innovation has the potential to make content production ten times cheaper and more accessible for smaller web-shops that don't have the substantial budgets of a company like Zalando.
The technology works by converting any simple, off-brand image of a clothing item into a textured 3D model. During this process, it also fills in missing areas, such as the sides and back of the garment, resulting in a fully textured model that can easily be combined with 3D models of digital models, making even video content possible.
In essence, this technology is rendering fashion modeling for brands like Zalando obsolete and, more importantly, providing smaller web-shops with access to high-quality content. This development eliminates a key advantage Zalando has held and may lead to a resurgence of smaller, niche web-shops that specialise in specific types or styles of clothing.
Hume AI: Detecting emotions in your speech
Hume, a lesser-known but highly intriguing startup, is making waves in the AI industry with its development of emotional expression tools. This innovative technology aims to provide AI models and other applications with a more nuanced understanding of human emotions. The potential implications are significant, as it could lead to the creation of Large Language Models and text-to-speech systems that can better incorporate human-like emotions in their generated text or speech output
Hume's technology is accessible through their API and a web-based playground, which includes a Voice-to-Voice AI with an EQ model capable of interpreting emotions from speech in real-time. As you speak to the tool, its interface displays a transcribed version of your last sentence alongside a detailed report of the emotions it detects in your voice.
OpenAI’s voice engine
Last week, OpenAI introduced their foundational voice/speech model named Voice Engine. Although it has been in use for some time, this is the first instance of a public discussion about the model, with a strong emphasis on safety and responsible use. Voice Engine was initially developed in late 2022, which is considered a significant time ago in the rapidly evolving Generative AI landscape. It has powered tools such as the text-to-speech API, ChatGPT Voice, and Read Aloud features.
Early adopters and use cases vary, including HeyGen utilising it for their translation platform and Age of Learning employing it to assist non-readers and children with natural-sounding voices that completely replace text formats. The model has even provided individuals who have partially lost their voice with the means to speak again, as demonstrated in the example below.
Current voice of someone who has lost their voice, controlling what is being said.
Reference audio that controls the tone of voice of the output voice.
Output of the model.
The purpose of this public announcement, despite the lack of direct access to the technology, is to address the safety implications associated with voice cloning models. It is crucial to recognise that voice cloning has the potential to significantly impact the creation of deep fake content. By making this announcement, the aim is to raise awareness among the general public and politicians about the consequences of this technology. More importantly, this publication serves as a call to action.
It urges everyone, particularly policymakers, to consider and establish necessary regulations and guidelines. These should ensure the safe and responsible use of voice cloning technology, while maintaining its accessibility to those who can benefit from it.