Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos

Rundong Luo¹ Matthew Wallingford² Ali Farhadi² Noah Snavely¹ Wei-Chiu Ma¹
¹Cornell University ²University of Washington

ICCV 2025

Note: Hover over the video to view in 360°.

Input Video

Generated 360° Panorama

**360° videos generated by our model, Argus^*.** Starting from an arbitrary input perspective video capturing everyday activities (left), Argus generates a full 360° panoramic video, with the red box indicating the corresponding region in the generated frame.

^*Argus is named after a figure in Greek mythology with many eyes, symbolizing the ability to observe from multiple perspectives.

Visualization in Perspective Views

Starting from an input perspective video with arbitrary camera motion (red boxes), Argus generates a full 360° panoramic video (visualized as environment maps), with the red box indicating the corresponding region in the generated frame. The blue, orange, and purple boxes show additional sampled perspectives from the generated 360° video.

Abstract

360° videos have emerged as a promising medium to represent our dynamic visual world. Compared to the "tunnel vision" of standard cameras, their borderless field of view offers a more holistic perspective of our surroundings. However, while existing video models excel at producing standard videos, their ability to generate full panoramic videos remains elusive. In this paper, we investigate the task of video-to-360° generation: given a perspective video as input, our goal is to generate a full panoramic video that is coherent with the input. Unlike conventional video generation tasks, the output's field of view is significantly larger, and the model is required to have a deep understanding of both the spatial layout of the scene and the dynamics of objects to maintain geometric and dynamic consistency with the input. To address these challenges, we first leverage the abundant 360° videos available online and develop a high-quality data filtering pipeline to curate pairwise training data. We then carefully design a series of geometry- and motion-aware modules to facilitate the learning process and improve the quality of 360° video generation. Experimental results demonstrate that our model can generate realistic and coherent 360° videos from arbitrary, in-the-wild perspective inputs. Additionally, we showcase its potential applications, including video stabilization, camera viewpoint control, and interactive visual question answering.

Analysis: Interpreting Scene Dynamics

We demonstrate Argus accurately understands dynamics across the 360° scene from a narrow perspective input. Using a 360° camera, we captured a video of a car driving by while providing our model with a 60° horizontal FoV region from a static camera pose (left). The car's ground truth trajectory (middle) and our model's predicted trajectory (right) show strong alignment, confirming Argus's ability to accurately interpret scene dynamics. Note: if the videos do not play, click either of the three videos will trigger them simultaneously.

Input Video

Ground truth trajectory

Predicted trajectory (ours)

Analysis: Reconstructing the Scene from Generated Videos

We unwrap a rotating perspective video from our generated 360° video and show the scene reconstructed from it using MegaSaM. As shown, the reconstruction is geometrically consistent, justifying our generated 360° video achieves high realism.

Click the image to view interactive results

Analysis: Using Generated Videos as Input

We test Argus on perspective videos generated by the text-to-video model Gen-3-Turbo with prompt "Central Park." As shown, Argus generalizes to generated videos.

Input video

360° video generated by Argus

Camparison with PanoDiffusion (Image-to-360° Generation)

Qualitative comparison with 360° image generation method PanoDiffusion [1]. The input region is highlighted in red, while orange and blue regions indicate extracted perspective views. Although PanoDiffusion can generate plausible 360° images from perspective inputs, it struggles with maintaining temporal consistency.

[1] Wu et al. PanoDiffusion: 360-degree Panorama Outpainting via Diffusion. In ICLR, 2024.

Input Video

Argus (ours)

PanoDiffusion

Camparison with Follow-Your-Canvas (Video Outpainting)

Qualitative comparison with Follow-Your-Canvas [2] for 360° video generation. Videos generated by Follow-Your-Canvas look like normal perspective videos, and its generation quality declines noticeably as it extends further from the input viewpoint.

[2] Chen et al. Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. In AAAI, 2025.

Argus (ours)

Follow-Your-Canvas

Application: Video Stabilization

Argus shows promising application in video stabilization without modifications. Traditional video stabilization techniques require cropping, leading to a reduced field of view and visual information loss. In contrast, Argus enables video stabilization with a consistent field of view, as the generated panorama preserves scene information across frames.

Input video

Stabilization (Argus)

Stabilization (reference)

Application: Camera View Direction Control

Argus enables camera viewpoint control in dynamic environments by unwrapping the generated 360° scene into perspective views. This capability allows exploration beyond the initial field of view, enhancing immersion in the scene.

Input Video

Clockwise rotation by 30 degrees

Clockwise rotation by 45 degrees

Input Video

Counterclockwise rotation by 30 degrees

Counterclockwise rotation by 45 degrees

Application: Dynamic Environment Map for Object Relighting

Argurs enables realistic object relighting by leveraging the generated 360° panorama videos as dynamic environment maps. We show the results of rendering a metallic sphere in Blender with the generated videos.

Application: Interactive Visual Question Answering

The panorama video generated by Argus can help visual question answering in dynamic environments. By enabling free rotation of the camera, Argus allows for comprehensive spatial understanding by seeing the scene from multiple perspectives. This flexibility supports interactive visual question answering, such as verifying if a vehicle overlaps with a crosswalk, overcoming the limitations of fixed-viewpoint videos. This capability enhances scene comprehension and opens new possibilities for video analysis applications.

Dataset

We start with the 360-1M dataset [3], containing approximately 1 million videos of varying quality, and systematically filter down to 283,863 high-quality 10-second video clips. Examples of our dataset are shown below.

[3] Wallingford et al. From an Image to a Scene: Learning to Imagine the World from a Million 360° Videos. In NeurIPS, 2024.

Citation

            
              @inproceedings{luo2025beyond,
                title={Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos},
                author={Luo, Rundong and Wallingford, Matthew and Farhadi, Ali and Snavely, Noah and Ma, Wei-Chiu},
                booktitle={ICCV},
                year={2025}
              }

Acknowledgments

The research is partially supported by NVIDIA Academic Grant Program, a gift from Ai2, and DARPA TIAMAT program No. HR00112490422. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of DARPA.

Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos

Note: Hover over the video to view in 360°.

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

Input Video Generated 360° Panorama

360° videos generated by our model, Argus*. Starting from an arbitrary input perspective video capturing everyday activities (left), Argus generates a full 360° panoramic video, with the red box indicating the corresponding region in the generated frame.

Visualization in Perspective Views

Abstract

Analysis: Interpreting Scene Dynamics

Input Video

Ground truth trajectory

Predicted trajectory (ours)

Analysis: Reconstructing the Scene from Generated Videos

Click the image to view interactive results

Analysis: Using Generated Videos as Input

Input video 360° video generated by Argus

Camparison with PanoDiffusion (Image-to-360° Generation)

Input Video Argus (ours) PanoDiffusion

Input Video Argus (ours) PanoDiffusion

Input Video Argus (ours) PanoDiffusion

Input Video Argus (ours) PanoDiffusion

Input Video Argus (ours) PanoDiffusion

Input Video Argus (ours) PanoDiffusion

Input Video Argus (ours) PanoDiffusion

Camparison with Follow-Your-Canvas (Video Outpainting)

Argus (ours) Follow-Your-Canvas

Argus (ours) Follow-Your-Canvas

Argus (ours) Follow-Your-Canvas

Argus (ours) Follow-Your-Canvas

Argus (ours) Follow-Your-Canvas

Argus (ours) Follow-Your-Canvas

Application: Video Stabilization

Input video Stabilization (Argus) Stabilization (reference)

Input video Stabilization (Argus) Stabilization (reference)

Input video Stabilization (Argus) Stabilization (reference)

Input video Stabilization (Argus) Stabilization (reference)

Input video Stabilization (Argus) Stabilization (reference)

Input video Stabilization (Argus) Stabilization (reference)

Application: Camera View Direction Control

Input Video Clockwise rotation by 30 degrees Clockwise rotation by 45 degrees

Input Video Clockwise rotation by 30 degrees Clockwise rotation by 45 degrees

Input Video Clockwise rotation by 30 degrees Clockwise rotation by 45 degrees

Input Video Clockwise rotation by 30 degrees Clockwise rotation by 45 degrees

Input Video Counterclockwise rotation by 30 degrees Counterclockwise rotation by 45 degrees

Input Video Counterclockwise rotation by 30 degrees Counterclockwise rotation by 45 degrees

Application: Dynamic Environment Map for Object Relighting

Application: Interactive Visual Question Answering

Dataset

Citation

Acknowledgments

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

Input Video

Generated 360° Panorama

**360° videos generated by our model, Argus^*.** Starting from an arbitrary input perspective video capturing everyday activities (left), Argus generates a full 360° panoramic video, with the red box indicating the corresponding region in the generated frame.

Input video

360° video generated by Argus

Input Video

Argus (ours)

PanoDiffusion

Input Video

Argus (ours)

PanoDiffusion

Input Video

Argus (ours)

PanoDiffusion

Input Video

Argus (ours)

PanoDiffusion

Input Video

Argus (ours)

PanoDiffusion

Input Video

Argus (ours)

PanoDiffusion

Input Video

Argus (ours)

PanoDiffusion

Argus (ours)

Follow-Your-Canvas

Argus (ours)

Follow-Your-Canvas

Argus (ours)

Follow-Your-Canvas

Argus (ours)

Follow-Your-Canvas

Argus (ours)

Follow-Your-Canvas

Argus (ours)

Follow-Your-Canvas

Input video

Stabilization (Argus)

Stabilization (reference)

Input video

Stabilization (Argus)

Stabilization (reference)

Input video

Stabilization (Argus)

Stabilization (reference)

Input video

Stabilization (Argus)

Stabilization (reference)

Input video

Stabilization (Argus)

Stabilization (reference)

Input video

Stabilization (Argus)

Stabilization (reference)

Input Video

Clockwise rotation by 30 degrees

Clockwise rotation by 45 degrees

Input Video

Clockwise rotation by 30 degrees

Clockwise rotation by 45 degrees

Input Video

Clockwise rotation by 30 degrees

Clockwise rotation by 45 degrees

Input Video

Clockwise rotation by 30 degrees

Clockwise rotation by 45 degrees

Input Video

Counterclockwise rotation by 30 degrees

Counterclockwise rotation by 45 degrees

Input Video

Counterclockwise rotation by 30 degrees

Counterclockwise rotation by 45 degrees