LiveWall's bi-weekly #2: Chat with DALL·E 3, cloning our own voice and spatial computing with Meta Quest 3
This week in our bi-weekly innovation update we’ll take a closer look at the new and improved DALL-E image generation AI from OpenAI and its new interface, a new trend in spatial computing with the new Meta Quest 3 VR/AR glasses, we’ve cloned our own voice and we’ll take a look at our latest innovation blog about the way we use three.js in our campaign for McDonald’s.
AI is progressing fast, first off almost all of the most prevalent AI tools were text based. Progress in technology and a huge influx of capital is starting to change this up. New image and audio based models are getting better by the day. They are reaching a point where they are easy to use, cheap and of high quality. This combination makes them accessible to a wider group of people instead of just a small group of highly technical people with a lot of financial resources.
DALL-E-3 in ChatGPT
While MidJourney has been the king of high quality image generation over the last months, their interface and pricing has held further adoption back. This gap is now being filled by the new image generation model build by OpenAI called DALL·E 3. This model is a huge improvement from the old DALL·E 2 model that has been struggling to keep up with Midjourney and Adobe Firefly.
While one may argue that the image quality isn’t as good as Midjourney’s, it is really damn close to it. If we compare the two of them on image generation there are some noticeable differences in the approaches they take. MidJourney is a lot more artistic and style specific, you can create really impressive and high quality images. However this comes with a disadvantage, you have to be really specific in your prompting.
Getting to the “perfect” image can take longer because of this variable. DALL·E focusses on accuracy instead of artistic impressions, this makes for a more straightforward process but with a less imaginative output. The user-interface of DALL·E also is a lot better, the integration within ChatGPT makes it really easy to use and accessible for anyone with GPT plus.
While generating images within ChatGPT, it gives you the possibility to have a conversation about what is being generated. Which is a great feature if you want to tweak certain aspects of the image. Or perhaps you can use it to generate your own fairy tale with the same characters returning in different settings.
Elevenlabs voice cloning
(listen to this section with Bas’s voice generated in Elevenlabs)
As image generation is making big leaps, audio generation isn’t staying behind. Generating audio with a predetermined voice is cool and useful, but it gets really interesting when you can use your own or someone else's voice. Instead of having to spend a lot of time and money on a studio session, you simply copy your voice once and generate any speech you want. Great for radio commercials, voice overs or (in the future) podcasts.
One of the main players in this market is Elevenlabs, with their easy to use and low-cost platform, you can clone your voice in minutes. It is really easy to use and produces an astonishing high quality. There are two main challenges in this process, first off: getting high quality audio of yourself. When you use a few minutes of recording from your phone, it works fine. Although for the best results, you want at least 1 to 3 hours of audio with professional equipment.
The other challenge is getting emotion and intonation into your cloned voice? For example: if you upload an audio file of yourself reading a book, the model will be trained on your tone of voice when reading a book. An alternative for this is to record a conversation with someone, where you edit out all the parts of the other person in post. Like in a podcast setting for example.
Mixing reality with the new Meta Quest 3
We are getting better and better at generating virtual worlds with technologies like image, text and audio generation. The next step is bringing these virtual assets to the real world with mixed reality capabilities like the new Meta Quest 3. It’s a mixed reality headset that allows you to bring the virtual world to your real world, but how does it do this?
We all have seen or even experienced the likes of virtual reality (VR), think of an Oculus rift to play games with. Or augmented reality (AR), think of the Pokemons that you saw through your screen in Pokemon Go. While these are really cool, mixed reality (MR) is one step further. It combines both VR and AR with each other into one device like the Meta Quest 3. With this device you can seamlessly switch between the real world, or the virtual world or even both at the same time! Which is also called spatial computing.
Quest 3 in a shop. Video: @CixLiv(X)
Quest 3 in an elevator. Video: @kukurio59 (TikTok)
The video above shows a person walking into a coffee shop and ordering something to drink. Or someone watching a movie while waiting in the elevator. Seems pretty weird right? Would you walk into public spaces with a headset like this on your face?
This highlights my biggest question mark of the Meta Quest 3, with its relatively low price tag of 550 euro, is a device like this going to be accepted in public? We are probably not there yet, but we might be getting closer to people walking around with MR headsets on their heads.
The innovation behind the Summer of Skate by McFlurry: Three.js for web-based 3d animations
Dive into the innovative technology behind the McDonald's 'Summer of Skate' campaign, where we used Three.js for a web-based 3D customization experience, enabling users to design and animate their own McDonald's merchandise in real-time. Click here to open the article.
Thank you for joining us in this exploration of the new frontiers in AI and tech innovations. Stay tuned for more updates in the coming weeks, and feel free to share this digest within your network to spread the knowledge.
Stay Innovative,
LiveWall