A man is jumping → A shiny silver robot is jumping
A man is dancing → A shiny silver robot is dancing
TokenFlow performs per frame inversion. In the second column we provide TokenFlow results with DDIM inversion per frame, whereas in the third column we provide TokenFlow results with DDPM inversion per frame. Slicedit performs inversion with the inflated denoiser over the whole volume. In the fourth column we provide Slicedit results with DDIM inversion over the whole volume, whereas in the fifth column we provide Slicedit results with DDPM inversion over the whole volume.
A man is doing parkour → A shiny silver robot is doing parkour
Slicedit uses DDPM inversion. For ablation purposes we provide results of our method with DDIM inversion. As can be seen, when using DDIM inversion our method is not able to successfully edit the video according to the text prompt, using different classifier-free guidance (CFG) scales. Results are given for CFG value of 10, 15, 20.
A small dog is looking out a car window
A small elephant is looking out a car window
A car on the road
A tractor on the road