Jie Yang 杨杰

Senior Researcher · Multimodal Foundation Models

WeChat Vision, Tencent
📍 Beijing, China
✉️ cvjieyang@tencent.com
Jie Yang

About

Biography

I am a Senior Researcher at WeChat Vision, Tencent, working on multimodal foundation models. My research interest covers the full training stack — from large-scale pretraining to post-training optimization for models that perceive, reason, and act across image, video, audio, and language.

I received my Ph.D. from The Chinese University of Hong Kong, Shenzhen, advised by Prof. Ruimao Zhang and Prof. Zhen Li. During my Ph.D. (2021–2025), I was fortunate to spend research stays at IDEA, BIGAI, SenseTime Research, and Tencent, where I worked on multi-modal learning for diverse vision tasks.

Omni-modal Foundation Models MLLM Pretraining & Post-Training Video/Audio Understanding
📣 Hiring interns! We are actively seeking self-motivated interns to work on related research topics, including image / video / omni pretraining & post-training. If you're interested, feel free to reach out via email!

Updates

News

Research [Full list on Google Scholar]

Selected Publications

# corresponding author

Community

Academic Services

Conference & Journal Reviewer

Serving as a reviewer for top-tier venues in computer vision and machine learning:
CVPR ICLR ICML NeurIPS ECCV ICCV T-PAMI TNNLS TMM

Challenge Organizer

Co-organizer of the MICCAI AMOS Segmentation Challenge 2022, a large-scale benchmark on multi-organ abdominal segmentation across CT and MRI modalities.

Recognition

Honors & Awards