Research
My research lies broadly in computer vision and multi-modal learning, especially generative models and AIGC-related topics,
including cross-view & novel view synthesis, personalized text-to-image generation, and fundamental computer vision problems such as stereo matching, optical flow estimation, depth
estimation and domain adaptation.
|
|
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Zhexiao Xiong,
Xin Xing,
Scott Workman,
Subash Khanal,
Nathan Jacobs
arXiv
We introduce the task of mixed-view panorama synthesis, where the goal is to synthesize a novel panorama given a small set of input panoramas and a satellite image of the area.
This contrasts with previous work which only uses input panoramas (same-view synthesis), or an input satellite image (cross-view synthesis).
We argue that the mixed-view setting is the most natural to support panorama synthesis for arbitrary locations worldwide.
A critical challenge is that the spatial coverage of panoramas is uneven, with few panoramas available in many regions of the world.
We introduce an approach that utilizes diffusion-based modeling and an attention-based architecture for extracting information from all available input imagery.
Experimental results demonstrate the effectiveness of our proposed method. In particular, our model can handle scenarios when the available panoramas are sparse or far from the location of the panorama we are attempting to synthesize.
|
|
Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning
Xin Xing,
Zhexiao Xiong,
Abby Stylianou,
Srikumar Sastry,
Liyu Gong ,
Nathan Jacobs
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW), 2024
arXiv
We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a visionlanguage model to suggest strong positive and negative
pseudo-labels, and outperform the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds.
|
|
StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation
Zhexiao Xiong,
Feng Qiao,
Yu Zhang,
Nathan Jacobs,
British Machine Vision Conference (BMVC), 2023
arXiv
We introduce a novel training strategy for stereo matching and optical flow estima- tion that utilizes image-to-image translation between synthetic and real image domains.
Our approach enables the training of models that excel in real image scenarios while relying solely on ground-truth information from synthetic images.
To facilitate task- agnostic domain adaptation and the training of task-specific components, we introduce a bidirectional feature warping module that handles both left-right and forward-backward directions.
Experimental results show competitive performance over previous domain translation-based methods,
which substantiate the efficacy of our proposed framework, effectively leveraging the benefits of unsupervised domain adaptation, stereo matching, and optical flow estimation.
|
|
PruneFaceDet: Pruning lightweight face detection network by sparsity training
Nanfei Jiang ,
Zhexiao Xiong,
Hui Tian ,
Xu Zhao,
Xiaojie Du ,
Chaoyang Zhao ,
Jinqiao Wang,
Cognitive Computation and Systems
We propose a network pruning pipeline,PruneFaceDet, to prune the lightweight face detection network, which performs training
with L1 regularisation before CP. We compare two thresholding methods to get proper
pruning thresholds in the CP stage. We apply the proposed pruning pipeline on the lightweight
face detector and evaluate the performance on the WiderFace dataset, and get the result of a 56.3% decline of
parameter size with almost no accuracy drop.
|
Services
Conference Reviewer: ECCV 2024, NeurIPS 2024
Internships: CASIA (2021), OPPO (2022 Spring), OPPO US Research Center(2024 summer)
|
Thank Jon Barron for sharing his website's source code.
|
|