December 13, 2024
Stability AI turbocharges text-to-image generation with SDXL Turbo


Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


Generating images with AI from a simple text prompt is getting faster — a whole lot faster than ever before thanks to new methods being adopted by Stability AI, creator of the widely used Stable Diffusion model. 

No longer do humans need to waste precious seconds or even minutes waiting for AI to generate an image based on their prompt. With the new SDXL Turbo mode announced this week by Stability AI, real-time image generation is now available to the masses. 

This is thanks to a massive reduction in generation steps — what used to take 50 steps now takes one. This also results in a reduced compute load. According to Stability AI, SDXL Turbo can generate a 512×512 image in just 207ms on an A100 GPU which is a major speed improvement over prior AI diffusion models.

The overall SDXL Turbo experience is very much reminiscent of how Google (and other search, browser and operating system vendors) now enable predictive typing for queries, except this is for image generation at the speed of thought.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

 

Learn More

Sometimes, faster speed comes from faster hardware, but that’s not the case here. It’s not some form of super charged hardware that is enabling the turbo acceleration for SDXL, rather it is a new technique that Stability AI has been researching known as Adversarial Diffusion Distillation (ADD).

“One step Stable Diffusion XL with our new Adversarial Distilled Diffusion (ADD) approach,” Emad Mostaque, founder and CEO of Stability AI wrote in a post on X (formerly Twitter).  “Less diversity, but way faster & more variants to come which will be… interesting, particularly with upscales & more..”

SDXL – but faster!

The SDXL base model was first announced by Stability AI in July. At the time Mostaque told VentureBeat that he expected it would be a solid base on which other models would emerge. Stable diffusion competes against multiple text to image generation models including OpenAI’s DALL-E and Midjourney among others.

One of the key innovations that enables the original SDXL base model is the concept of ControlNets that help to create better control for image composition. The SDXL base model also benefitting from 3.5 billion parameters, which Mostaque said provides better accuracy because the model is aware of more concepts.

SDXL Turbo builds on the innovations of the SDXL base model and makes generation faster.

With SDXL Turbo, Stability AI is following a path that is becoming increasingly common for modern generative AI development. That path involves first developing the most accurate model possible, then optimizing it for performance. It’s a path that OpenAI has taken with GPT 3.5 Turbo and more recently GPT-4 Turbo.

In the process of accelerating generative AI models, there is often a tradeoff with quality and accuracy. That tradeoff is barely present in SDXL Turbo, with highly detailed results that are only marginally lower image quality than a non-accelerated version of SDXL.

What is an Adversarial Diffusion Distillation (ADD)?

In AI the concept of a Generative Adversarial Network (GAN) is well understood and used to help build deep learning neural networks that can respond rapidly. For image generation, stable diffusion is built around the concept of a diffusion model, which is a type of model that takes a more iterative process to content generation and typically isn’t nearly as fast as GAN based AI.  ADD takes the best of both worlds.

“The aim of this work is to combine the superior sample quality of DMs [diffusion models] with the inherent speed of GANs,” the ADD research report states.

The Adversarial Diffusion Distillation (ADD) approach developed by Stability AI researchers is an attempt to outperform other AI approaches for image generation. According to the researchers, ADD is the first method to unlock single-step, real-time image synthesis with foundation models.

ADD uses a combination of adversarial training and score distillation to leverage knowledge from a pretrained image diffusion model. The key benefits are fast sampling while retaining high fidelity, iterative refinement ability, and leveraging stable diffusion model pretraining. 

Experiments conducted by the researchers show ADD significantly outperforms GANs, Latent Consistency Models, and other diffusion distillation methods in 1-4 steps.

The SDXL Turbo model is not considered to be ready for commercial use according to Stability AI, though it is already available in preview on the company’s Clipdrop web service. 

In limited testing by VentureBeat, the image generation was certainly fast, though the Clipdrop beta (at least for now) doesn’t have some of the more advanced parameter options for image generation for different styles. Stability AI has also made the code and model weights available on Hugging Face under a non-commercial research license.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.





Source link