close
close

New upgrades for high-fidelity novel view and 4D generation from a single video stability AI

Stable video 4D 2.0

We have updated Stable video eiffusion 4D (SV4D) To the stable video 4D 2.0 (SV4D 2.0), whereby higher quality outputs are delivered on the real videos. This multi-view video diffusion model is ideal for the dynamic 4D asset generation from a single object-oriented video. These upgrades make it easier to create dynamic 4D assets for professional production workflows, from the generation of fuel leaves for characters in the game to supporting assets for film and virtual worlds.

The production of multi-views remains complex due to the inherent ambiguity of the visualization of 3D objects from invisible views. This is particularly difficult when the test subjects are on the move. SV4D 2.0 makes incremental progress in coping with this challenge by creating consistent, multi-angle outputs without relying on large data records, multi-camera setups or preliminary processing. While this shows one step forward, occasional artifacts can still occur with dynamic movement.

What is new

We have made several upgrades on SV4D 2.0, including:

  • Sharper and coherent 4D outputs: The model was trained in phases, starting with static 3D assets and then to a movement, which led to clearer and more consistent 4D results.

  • No reference views required: Works directly from a single video and eliminates the need for reference images with several views.

  • Newly designed network architecture: Uses 3D attention, a mechanism that merges 3D spatial and temporal features and improves the spatial and temporal consistency without relying on reference views.

  • Improved realization: Consistently conducts on real videos. The model is retained in front of trained video models worldwide during training on synthetic data.

Research and benchmarking

Our analysis shows that SV4D 2.0 achieves state-of-the-art results in the 4D generation. It comes first in all important benchmarks: LPIPS (fidelity), FVD-V (Multi-View consistency), FVD-F (temporal coherence) and FV4D (4D consistency). Compared to Dreamgaussian4D, L4GM and SV4D, this version generates sharper and more consistent 4D outputs.

Leave a Comment