Technion – Israel Institute of Technology
Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX.
See the following video for visual intuition about our method. The video includes narration.
The following figure illustrates the main idea behind our method:
(a) In inversion based editing, the source image Zsrc0 is first mapped to the
noise space by solving the forward ODE conditioned on the source prompt (left path).
Then, the extracted noise is used to solve the reverse ODE conditioned on the target prompt to
obtain Ztar0 (right path).
The images at the bottom visualize this transition.
(b) We reinterpret inversion as a direct path between the source and target distributions (bottom path).
This is done by using the velocities calculated during the inversion and sampling (green and red arrows) to calculate an editing direction (orange arrow)
that drives the evolution of the direct path Zinvt through an ODE.
The resulting path is noise-free, as demonstrated by the images at the bottom.
(c) FlowEdit traverses a
A bicycle parked next to a red brick building
A vespa parked next to a red brick building
A rabbit sitting in a field with flowers
A puppy sitting in a field with flowers
A glass of milk
A glass of beer
A restaurant called Luna
A restaurant called Sol
A woman meditating
A wooden statue meditating
A cat wearing a crown
A cat wearing a top hat
A coconut shell filled with splashing water
A baseball shell filled with splashing water
A wolf standing on a cliff, howling
A Husky standing on a cliff, looking
A horse in the field
A pink toy horse in the field
Two penguins
Two origami penguins
Clownfish swimming in a reef
Goldfish swimming in a reef
A dog in the snow
A deer in the snow
A gas station with a CAFE sign → A gas station with a CVPR sign
A large tiger standing in a swamp → A large lion standing in a swamp
A tall white lighthouse, illuminated by bright light → The Big Ben, illuminated by bright light
A colorful parrot perching on a tree branch → A gray pigeon perching on a tree branch
A three layer cake decorated with fruits → A three layer cake decorated with strawberries
A gas station with a CAFE sign → A gas station with a CVPR sign
A large tiger standing in a swamp → A large lion standing in a swamp
A tall white lighthouse, illuminated by bright light → The Big Ben, illuminated by bright light
A colorful parrot perching on a tree branch → A gray pigeon perching on a tree branch
A three layer cake decorated with fruits → A three layer cake decorated with strawberries
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli.
Bibtex
Our official code can be found in the official github repository.
[1] Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu and Guosheng Lin. "Text-to-Image Rectified Flow as Plug-and-Play Priors."
[2] Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai and Wen-Sheng Chu. "Semantic Image Inversion and Editing using Stochastic Rectified Differential Equations."