1Peking University 2Peng Cheng Laboratory
In this work, we present a novel similarity min-max framework for zero-shot day-night domain adaptation. On the image level, we generate a synthetic nighttime domain that shares minimum feature similarity with the daytime domain to enlarge the domain gap. On the model level, we learn illumination-robust representations by maximizing the feature similarity of images from the two domains for better model adaptation. Our framework can serve as a plug-and-play remedy to existing daytime models. To verify its effectiveness, we conduct extensive experiments on multiple high-level nighttime vision tasks, including classification, semantic segmentation, visual place recognition, and video action recognition. Results on various benchmarks demonstrate our superiority over the state-of-the-art.
Low-light conditions not only hamper human visual experience but also degrade the model's performance on downstream vision tasks. While existing works make remarkable progress on day-night domain adaptation, they rely heavily on domain knowledge derived from the task-specific nighttime dataset. This paper challenges a more complicated scenario with border applicability, i.e., zero-shot day-night domain adaptation, which eliminates reliance on any nighttime data. Unlike prior zero-shot adaptation approaches emphasizing either image-level translation or model-level adaptation, we propose a similarity min-max paradigm that considers them under a unified framework. On the image level, we darken images towards minimum feature similarity to enlarge the domain gap. Then on the model level, we maximize the feature similarity between the darkened images and their normal-light counterparts for better model adaptation. To the best of our knowledge, this work represents the pioneering effort in jointly optimizing both aspects, resulting in a significant improvement of model generalizability. Extensive experiments demonstrate our method's effectiveness and broad applicability on various nighttime vision tasks, including classification, semantic segmentation, visual place recognition, and video action recognition.
We propose a similarity min-max framework for zero-shot day-night domain adaptation. (a) We first train a darkening module $D$ with a fixed feature extractor to generate synthesized nighttime images that share minimum similarity with their daytime counterparts. (b) After obtaining $D$, we freeze its weights and maximize the day-night feature similarity to adapt the model to nighttime.
Nighttime Semantic Segmentation. Qualitative comparison results for semantic segmentation on Dark-Zurich. Low-light enhancement methods perform poorly on nighttime street scenes. Our method better extracts information hidden by darkness and thus generates more accurate semantic maps.
Visual Place Recognition. For each group, the query image is shown on the left; the first two images (i.e., two images that have the highest similarity with the query image) are shown on the right. Compared with the baseline model (GeM) that often gets deceived by the nighttime appearance, our model can extract features more robust to illumination and thus retrieve the correct daytime image showing the same scene as the query image.
Low-Light Video Action Recognition. Qualitative comparison results for low-light video action recognition on ARID. All video enhancement methods perform poorly and therefore mislead the classifier, while our adapted model correctly classifies the video with more than 99% confidence.
@inproceedings{luo2023similarity,
title={Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation},
author={Luo, Rundong and Wang, Wenjing and Yang, Wenhan and Liu, Jiaying},
booktitle={ICCV},
year={2023},
}
If you have any questions, please feel free to contact us:
Group Page: STRUCT