Jie Yang 杨杰

Senior Researcher · Multimodal Foundation Models

WeChat Vision, Tencent
📍 Beijing, China
✉️ cvjieyang@tencent.com
Jie Yang

Biography

I am a Senior Researcher at WeChat Vision, Tencent, working on multimodal foundation models. My research interest covers the full training stack — from large-scale pretraining to post-training optimization for models that perceive, reason, and act across image, video, audio, and language.

I received my Ph.D. from The Chinese University of Hong Kong, Shenzhen, advised by Prof. Ruimao Zhang and Prof. Zhen Li. During my Ph.D. (2021–2025), I worked toward intelligent agents that can collaborate with humans in dynamic environments, exploring three closely connected directions: (1) human-centric visual understanding and reasoning, with a focus on interpreting human states, behaviors, and intentions (e.g., ED-Pose, X-Pose); (2) multi-modal scene perception, integrating diverse modalities to make sense of complex real-world scenes (e.g., MP-HOI, Magic-HOI); and (3) behavior planning and decision-making, enabling agents to act upon what they perceive and understand (e.g., F-HOI, InteractAnything, VIKI-R).

Omni-modal Foundation Models MLLM Pretraining & Post-Training Video/Audio Understanding
📣 Hiring interns! We are actively seeking self-motivated interns to work on related research topics, including image / video / omni pretraining & post-training. If you're interested, feel free to reach out via email!

News

Selected Publications [Full list on Google Scholar]

# corresponding author

Academic Services

Conference & Journal Reviewer

Serving as a reviewer for top-tier venues in computer vision and machine learning:
CVPR ICLR ICML NeurIPS ECCV ICCV T-PAMI TNNLS TMM

Challenge Organizer

Co-organizer of the MICCAI AMOS Segmentation Challenge 2022, a large-scale benchmark on multi-organ abdominal segmentation across CT and MRI modalities.

Honors & Awards