Comparison

Comparison between DDLP and SlotFormer

We compare DDLP to two SOTA object-centric models: G-SWM (patch-based model) and SlotFormer (slot-based model).
We provide visual comparisons for \(128 \times 128\) resolution videos.
Note that both G-SWM and SlotFormer were originally trained on \(64 \times 64\) videos. For a quantitative comparison with the publicly available pre-trained models (\(64 \times 64\)), please refer to our paper.
Video comparisons with G-SWM are available under the Video Prediction section.

DDLP