What If...?

Video manipulation and consequence prediction with DDLP

DDLP provides an interpretable latent representation allowing to directly modify scenes in the latent space by changing objects’ learned attributes – position, scale, depth or shape.
Equipped with a pre-trained DDLP on OBJ3D, we take an initial sequence of frames, encode them to latent particles and manually locate and modify specific particles corresponding to objects-of-interest according to a specific scenario, e.g., ``what if the green ball moved to the left?’’.

Then, we unroll the latent sequence with the dynamics module and decode to produce a video of the consequences of our interventions.