Zhexiao Xiong

I am a second year Ph.D. candidate in Computer Science in Multimodal Vision Research Laboratory (MVRL) at Washington University in St. Louis, advised by Prof. Nathan Jacobs. Before that, I received my bachelor's degree in Electrical and Information Engineering from Tianjin University. I spent a wonderful year at Institute of Automation, Chinese Academy of Sciences(CASIA), working with Prof. Jinqiao Wang and Dr. Xu Zhao. I was a research intern at OPPO Research.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Github

profile photo
Research

My research lies broadly in computer vision and multi-modal learning, especially generative models and their application in autonomous driving and remote sensing scenes, including cross-view & novel view synthesis, birds-eye-view perception, and fundamental computer vision problems such as stereo matching, optical flow estimation, depth estimation and domain adaptation.


I am actively looking for internships in 2024 Summer. Feel free to contact me!
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Zhexiao Xiong, Xin Xing, Scott Workman, Subash Khanal, Nathan Jacobs
(In Submission)

Use geospatial information to guide the diffusion model in mixed-view panoramas synthesis task.

Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning
Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong , Nathan Jacobs
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW), 2024
arXiv

We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a visionlanguage model to suggest strong positive and negative pseudo-labels, and outperform the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds.

StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation
Zhexiao Xiong, Feng Qiao, Yu Zhang, Nathan Jacobs,
British Machine Vision Conference (BMVC), 2023
arXiv

We introduce a novel training strategy for stereo matching and optical flow estima- tion that utilizes image-to-image translation between synthetic and real image domains. Our approach enables the training of models that excel in real image scenarios while relying solely on ground-truth information from synthetic images. To facilitate task- agnostic domain adaptation and the training of task-specific components, we introduce a bidirectional feature warping module that handles both left-right and forward-backward directions. Experimental results show competitive performance over previous domain translation-based methods, which substantiate the efficacy of our proposed framework, effectively leveraging the benefits of unsupervised domain adaptation, stereo matching, and optical flow estimation.

PruneFaceDet: Pruning lightweight face detection network by sparsity training
Nanfei Jiang , Zhexiao Xiong, Hui Tian , Xu Zhao, Xiaojie Du , Chaoyang Zhao , Jinqiao Wang,
Cognitive Computation and Systems

We propose a network pruning pipeline,PruneFaceDet, to prune the lightweight face detection network, which performs training with L1 regularisation before CP. We compare two thresholding methods to get proper pruning thresholds in the CP stage. We apply the proposed pruning pipeline on the lightweight face detector and evaluate the performance on the WiderFace dataset, and get the result of a 56.3% decline of parameter size with almost no accuracy drop.

Services

Conference Reviewer: ECCV 2024
Internships: CASIA (2021), OPPO (2022 Spring)


Thank Jon Barron for sharing his website's source code.