Slicedit Supplementary

DDIM vs. DDPM Inversion

Original

TokenFlow DDIM

TokenFlow DDPM

Ours DDIM

Ours DDPM

A man is jumping → A shiny silver robot is jumping

A man is dancing → A shiny silver robot is dancing

TokenFlow performs per frame inversion. In the second column we provide TokenFlow results with DDIM inversion per frame, whereas in the third column we provide TokenFlow results with DDPM inversion per frame. Slicedit performs inversion with the inflated denoiser over the whole volume. In the fourth column we provide Slicedit results with DDIM inversion over the whole volume, whereas in the fifth column we provide Slicedit results with DDPM inversion over the whole volume.

Ours with Different DDIM Configuration

Original

CFG Scale 10

CFG Scale 15

CFG Scale 20

A man is doing parkour → A shiny silver robot is doing parkour

Slicedit uses DDPM inversion. For ablation purposes we provide results of our method with DDIM inversion. As can be seen, when using DDIM inversion our method is not able to successfully edit the video according to the text prompt, using different classifier-free guidance (CFG) scales. Results are given for CFG value of 10, 15, 20.

Limitation

A small dog is looking out a car window

A small elephant is looking out a car window

A car on the road

A tractor on the road