Slicedit Supplementary


DDIM vs. DDPM Inversion

Original
TokenFlow DDIM
TokenFlow DDPM
Ours DDIM
Ours DDPM

A man is jumping → A shiny silver robot is jumping

A man is dancing → A shiny silver robot is dancing

TokenFlow performs per frame inversion. In the second column we provide TokenFlow results with DDIM inversion per frame, whereas in the third column we provide TokenFlow results with DDPM inversion per frame. Slicedit performs inversion with the inflated denoiser over the whole volume. In the fourth column we provide Slicedit results with DDIM inversion over the whole volume, whereas in the fifth column we provide Slicedit results with DDPM inversion over the whole volume.



Ours with Different DDIM Configuration

Original
CFG Scale 10
CFG Scale 15
CFG Scale 20

A man is doing parkour → A shiny silver robot is doing parkour

Slicedit uses DDPM inversion. For ablation purposes we provide results of our method with DDIM inversion. As can be seen, when using DDIM inversion our method is not able to successfully edit the video according to the text prompt, using different classifier-free guidance (CFG) scales. Results are given for CFG value of 10, 15, 20.


Limitation

A small dog is looking out a car window

A small elephant is looking out a car window

A car on the road

A tractor on the road