Synthetic Media & Generative AI News and Discussions

Yuli Ban · Post by **Yuli Ban** » Thu Sep 29, 2022 8:52 pm

Abstract
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D assets and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.

Yuli Ban · Post by **Yuli Ban** » Sun Oct 02, 2022 11:40 pm

Yuli Ban · Post by **Yuli Ban** » Tue Oct 04, 2022 10:03 pm

ººº · Post by **ººº** » Tue Oct 04, 2022 11:50 pm

Yuli Ban wrote: ↑Tue Oct 04, 2022 10:03 pm

As expected they had to show the most boring examples.

Yuli Ban · Post by **Yuli Ban** » Tue Oct 04, 2022 11:54 pm

Big corporations like Meta basically HAVE to show off this tech using either "cute animals wearing funny clothes" or "landscapes and abstract objects" because they don't want to invite controversy early by showing off humans, because journalists are watching this technology like a hawk, primed to ask the question "But what about the potential for abuse?" Which to be fair, is a good question to ask.
Alas, wait until Stability releases THEIR text to video AI in the coming months to see a less filtered version.

ººº · Post by **ººº** » Wed Oct 05, 2022 12:12 am

^ I would actually like to see "abstract" prompts.

Yuli Ban · Post by **Yuli Ban** » Wed Oct 05, 2022 6:23 pm

HOLY SH....OOZIES.

"Imagen Video": Google announces video version of Imagen (Ho et al 2022)

Abstract
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.

I AM SPEECHLESS

ººº · Post by **ººº** » Wed Oct 05, 2022 6:42 pm

Yuli Ban wrote: ↑Wed Oct 05, 2022 6:23 pm HOLY SH....OOZIES.

"Imagen Video": Google announces video version of Imagen (Ho et al 2022)
Abstract
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding.
I AM SPEECHLESS

https://imagen.research.google/video/hdvideos/4.mp4
https://imagen.research.google/video/hdvideos/5.mp4
https://imagen.research.google/video/hdvideos/9.mp4
https://imagen.research.google/video/hdvideos/46.mp4
https://imagen.research.google/video/hdvideos/50.mp4

Yuli Ban · Post by **Yuli Ban** » Wed Oct 05, 2022 8:04 pm

ººº · Post by **ººº** » Wed Oct 05, 2022 8:47 pm

So if I got it right:

Make-A-Video: Best image quality

Phenaki: Longest video length

Imagen Video: Can generate phrases

Yuli Ban · Post by **Yuli Ban** » Wed Oct 05, 2022 9:39 pm

Yuli Ban · Post by **Yuli Ban** » Fri Oct 07, 2022 3:59 am

Generating realistic audio requires modeling information represented at different scales. For example, just as music builds complex musical phrases from individual notes, speech combines temporally local structures, such as phonemes or syllables, into words and sentences. Creating well-structured and coherent audio sequences at all these scales is a challenge that has been addressed by coupling audio with transcriptions that can guide the generative process, be it text transcripts for speech synthesis or MIDI representations for piano. However, this approach breaks when trying to model untranscribed aspects of audio, such as speaker characteristics necessary to help people with speech impairments recover their voice, or stylistic components of a piano performance.

In “AudioLM: a Language Modeling Approach to Audio Generation”, we propose a new framework for audio generation that learns to generate realistic speech and piano music by listening to audio only. Audio generated by AudioLM demonstrates long-term consistency (e.g., syntax in speech, melody in music) and high fidelity, outperforming previous systems and pushing the frontiers of audio generation with applications in speech synthesis or computer-assisted music. Following our AI Principles, we've also developed a model to identify synthetic audio generated by AudioLM.

Yuli Ban · Post by **Yuli Ban** » Fri Oct 07, 2022 4:03 am

Yuli Ban · Post by **Yuli Ban** » Mon Oct 17, 2022 4:11 am

Yuli Ban · Post by **Yuli Ban** » Wed Oct 19, 2022 10:39 pm

Yuli Ban · Post by **Yuli Ban** » Wed Oct 19, 2022 11:42 pm

Yuli Ban · Post by **Yuli Ban** » Thu Oct 20, 2022 11:35 pm

Who could have possibly foreseen this?

New AI image-generation systems make headlines every day but that revolution started many years ago. Now one of the most established services for AI face generation has expanded its offering to include full body images. An early use of computer-made faces was for news stories, video games, and documentaries when a person was needed to convey an idea or represent an unknown individual for which no photo was available. Keeping a stock library of faces isn’t too difficult for an agency but standing poses are harder since the type of clothing affects the possible uses of the images. In the past, one or more models would need to be hired for these types of shots.

Yuli Ban · Post by **Yuli Ban** » Mon Oct 24, 2022 8:25 am

The music industry’s lobbying arm claims that services using machine learning to alter tracks are infringing on artists’ rights.

As first reported by TorrentFreak, the Recording Industry Association of America listed AI-powered music websites that make remixes, improve homemade tracks, or strip songs of vocals or instrumentals harm artists, in a response to a request from the Office of the US Trade Representative.

Artists working within all kinds of media have raised concerns in recent years—and increasingly, with the rising popularity of text-to-image generators like DALL-E—about whether AI-generated art infringes on individuals’ copyright. Most AI content generators depend on datasets that are filled with original artworks, texts, or audio, and use those original works without the owners’ permission.

Yuli Ban · Post by **Yuli Ban** » Thu Oct 27, 2022 5:07 am

Yuli Ban · Post by **Yuli Ban** » Sat Oct 29, 2022 10:09 pm

Future Timeline

Synthetic Media & Generative AI News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions

Re: Synthetic Media & Deepfakes News and Discussions