23: Impressive photorealistic avatars with gaussian splats, full song creations and free AI

May 31, 2024

This week we’ll take a look at a new way to create photorealistic avatars with gaussian splats. Longer songs with Udio and ChatGPT opens up for free users. Let’s go!

NPGA: Neural Parametric Gaussian Avatars

Creating realistic digital versions of human heads is a key step in adding virtual elements to our daily lives. This is a tough research problem because these avatars need to look very real and work smoothly in real time. In this study, a research team introduces Neural Parametric Gaussian Avatars (NPGA), a new method to make high-quality, controllable avatars using multi-view video recordings.

Their approach uses 3D Gaussian Splatting, which is good at fast rendering and works well with the flexible shapes of point clouds. Unlike older methods that use mesh-based 3D models, NPGA bases the avatars' movements on neural parametric head models (NPHM), which offer a richer range of expressions. They convert the complex movements of the head model into simpler forms that work with standard rendering techniques. The finer details of expressions are learned from the video recordings.

To make these avatars even more expressive, they enhance the basic Gaussian point cloud with special features that control how it moves. To keep this increased expressiveness in check, they use Laplacian terms to regularize these features and movements. They tested their method on the public NeRSemble dataset, showing that NPGA outperforms previous methods in self-reenactment tasks by 2.6 PSNR. Additionally, they demonstrated that their avatars can be accurately animated using real-world single-camera videos.

Everything You Need To Know About Udio AI (Text To Audio)

Updates to UDIO: It’s now able to create 2 minute songs.

Udio (popular music generation AI tool) has launched new features, starting with a model that can generate two-minute tracks. This makes it easier to create music with a consistent structure and flow.

The two-minute model is available alongside the existing one. Initially, this is an experimental feature offered at a discounted rate for pro subscribers, but it will be available to everyone in the coming weeks.

New controls have been added to the 'advanced features' dropdown:

Random Seed Setting: You can now set the random seed to make clips reproducible (in manual mode). Using the same seed while changing the prompt or lyrics can help keep certain features without explicitly prompting for them.
Prompt or Lyrics Strength: You can control how much the prompts or lyrics influence the output. Higher prompt strength means the music will more closely follow the prompt, but may sound less natural. Lower lyrics strength can make vocals sound more natural, but sometimes the lyrics might be ignored.
Clip Start-Time: You can choose where the generated clip starts in a full song. For example, 0% starts at the beginning, 50% in the middle, and 90% near the end. This is useful for starting a track from an intro or when using the extension feature.
Generation-Quality Slider: This lets you balance quality and speed. You can explore faster without losing too much quality.

OpenAI opens up ChatGPT for free users
Last month OpenAI announced GPT-4o, its latest flagship model, offering the intelligence of GPT-4 but much faster and with improved capabilities in text, voice, and vision. This week they have started opening up the previously restricted capabilities to free users.

When using GPT-4o, ChatGPT Free users will now have access to features such as:

Experience GPT-4 level intelligence
Get responses(opens in a new window) from both the model and the web
Analyze data(opens in a new window) and create charts
Chat about photos you take
Upload files(opens in a new window) for assistance summarizing, writing or analyzing
Discover and use GPTs and the GPT Store
Build a more helpful experience with Memory

Discussion about this post

Ready for more?