Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
Input Video
Generated 360° Panorama
We test Argus on in-the-wild videos capturing everyday activities to verify its robustness. The input region is highlighted in red. As shown, Argus can generate long-term, immersive, and realistic 360° videos from real-world perspective inputs.
360° videos have emerged as a promising medium to represent our dynamic visual world. Compared to the "tunnel vision" of standard cameras, their borderless field of view offers a more holistic perspective of our surroundings. However, while existing video models excel at producing standard videos, their ability to generate full panoramic videos remains elusive. In this paper, we investigate the task of video-to-360° generation: given a perspective video as input, our goal is to generate a full panoramic video that is coherent with the input. Unlike conventional video generation tasks, the output's field of view is significantly larger, and the model is required to have a deep understanding of both the spatial layout of the scene and the dynamics of objects to maintain geometric and dynamic consistency with the input. To address these challenges, we first leverage the abundant 360° videos available online and develop a high-quality data filtering pipeline to curate pairwise training data. We then carefully design a series of geometry- and motion-aware modules to facilitate the learning process and improve the quality of 360° video generation. Experimental results demonstrate that our model can generate realistic and coherent 360° videos from arbitrary, in-the-wild perspective inputs. Additionally, we showcase its potential applications, including video stabilization, camera viewpoint control, and interactive visual question answering.
We demonstrate Argus accurately understands dynamics across the 360° scene from a narrow perspective input. Using a 360° camera, we captured a video of a car driving by while providing our model with a 60° horizontal FoV region from a static camera pose (left). The car's ground truth trajectory (middle) and our model's predicted trajectory (right) show strong alignment, confirming Argus's ability to accurately interpret scene dynamics.
Input Video
Ground truth trajectory
Predicted trajectory (ours)
We unwrap a rotating perspective video from our generated 360° video and show the scene reconstructed from it using MegaSaM. As shown, the reconstruction is geometrically consistent, justifying our generated 360° video achieves high realism.
We test Argus on perspective videos generated by the text-to-video model Gen-3-Turbo with prompt "Central Park." As shown, Argus generalizes to generated videos.
Input video
360° video generated by Argus
Qualitative comparison with 360° image generation method PanoDiffusion [1]. The input region is highlighted in red, while orange and blue regions indicate extracted perspective views. Although PanoDiffusion can generate plausible 360° images from perspective inputs, it struggles with maintaining temporal consistency.
[1] Wu et al. PanoDiffusion: 360-degree Panorama Outpainting via Diffusion. In ICLR, 2024.
Input Video
Argus (ours)
PanoDiffusion
Input Video
Argus (ours)
PanoDiffusion
Input Video
Argus (ours)
PanoDiffusion
Input Video
Argus (ours)
PanoDiffusion
Input Video
Argus (ours)
PanoDiffusion
Input Video
Argus (ours)
PanoDiffusion
Input Video
Argus (ours)
PanoDiffusion
Qualitative comparison with Follow-Your-Canvas [2] for 360° video generation. Videos generated by Follow-Your-Canvas look like normal perspective videos, and its generation quality declines noticeably as it extends further from the input viewpoint.
[2] Chen et al. Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. In AAAI, 2025.
Argus (ours)
Follow-Your-Canvas
Argus (ours)
Follow-Your-Canvas
Argus (ours)
Follow-Your-Canvas
Argus (ours)
Follow-Your-Canvas
Argus (ours)
Follow-Your-Canvas
Argus (ours)
Follow-Your-Canvas
Input video
Stabilization (Argus)
Stabilization (reference)
Input video
Stabilization (Argus)
Stabilization (reference)
Input video
Stabilization (Argus)
Stabilization (reference)
Input video
Stabilization (Argus)
Stabilization (reference)
Input video
Stabilization (Argus)
Stabilization (reference)
Input video
Stabilization (Argus)
Stabilization (reference)
Input Video
Clockwise rotation by 30 degrees
Clockwise rotation by 45 degrees
Input Video
Clockwise rotation by 30 degrees
Clockwise rotation by 45 degrees
Input Video
Clockwise rotation by 30 degrees
Clockwise rotation by 45 degrees
Input Video
Clockwise rotation by 30 degrees
Clockwise rotation by 45 degrees
Input Video
Counterclockwise rotation by 30 degrees
Counterclockwise rotation by 45 degrees
Input Video
Counterclockwise rotation by 30 degrees
Counterclockwise rotation by 45 degrees
Argurs enables realistic object relighting by leveraging the generated 360° panorama videos as dynamic environment maps. We show the results of rendering a metallic sphere in Blender with the generated videos.
We start with the 360-1M dataset [3], containing approximately 1 million videos of varying quality, and systematically filter down to 283,863 high-quality 10-second video clips. Examples of our dataset are shown below.
[3] Wallingford et al. From an Image to a Scene: Learning to Imagine the World from a Million 360° Videos. In NeurIPS, 2024.