
Comparison between DDLP and SlotFormer

  • We compare DDLP to two SOTA object-centric models: G-SWM (patch-based model) and SlotFormer (slot-based model).
  • We provide visual comparisons for \(128 \times 128\) resolution videos.
  • Note that both G-SWM and SlotFormer were originally trained on \(64 \times 64\) videos. For a quantitative comparison with the publicly available pre-trained models (\(64 \times 64\)), please refer to our paper.
  • Video comparisons with G-SWM are available under the Video Prediction section.



