Jie Yang 杨杰

Senior Researcher · Multimodal Foundation Models

WeChat Vision, Tencent

📍 Beijing, China

Biography

I am a Senior Researcher at WeChat Vision, Tencent, working on multimodal foundation models. My research interest covers the full training stack — from large-scale pretraining to post-training optimization for models that perceive, reason, and act across image, video, audio, and language.

I received my Ph.D. from The Chinese University of Hong Kong, Shenzhen, advised by Prof. Ruimao Zhang and Prof. Zhen Li. During my Ph.D. (2021–2025), I worked toward intelligent agents that can collaborate with humans in dynamic environments, exploring three closely connected directions: (1) human-centric visual understanding and reasoning, with a focus on interpreting human states, behaviors, and intentions (e.g., ED-Pose, X-Pose); (2) multi-modal scene perception, integrating diverse modalities to make sense of complex real-world scenes (e.g., MP-HOI, Magic-HOI); and (3) behavior planning and decision-making, enabling agents to act upon what they perceive and understand (e.g., F-HOI, InteractAnything, VIKI-R).

▸Omni-modal Foundation Models ▸MLLM Pretraining & Post-Training ▸Video/Audio Understanding

📣 Hiring interns! We are actively seeking self-motivated interns to work on related research topics, including image / video / omni pretraining & post-training. If you're interested, feel free to reach out via email!

News

One paper is accepted to ECCV 2026.
One paper is accepted to NeurIPS 2025.
We present WeThink for general-purpose vision-language reasoning.
One paper is accepted to T-PAMI.
One paper is accepted to CVPR 2025.
One paper is accepted to ICRA 2025.
One paper is accepted to NeurIPS 2024.
Grounding DINO is selected as The Most Influential Paper in ECCV 2024.
Three papers are accepted to ECCV 2024.
One paper is accepted to CVPR 2024.
We present X-Pose to detect any keypoints of any objects.
Grounded SAM is accepted to ICCV 2023 Demo Track.
One paper is accepted to ICCV 2023.
Two papers are accepted to MIDL 2023, one rated as oral presentation.
One paper is accepted to CVPR 2023.
One paper is accepted to ICLR 2023.
One paper is accepted to NeurIPS 2022.

Selected Publications [Full list on Google Scholar]

^# corresponding author

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Zijie Xin, Jie Yang^#, Ruixiang Zhao, Tianyi Wang, Fengyun Rao, Jing LYU, Xirong Li^#.

arXiv preprint 2026 PDF →

OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

Ruixiang Zhao, Jie Yang^#, Zijie Xin, Tianyi Wang, Fengyun Rao, Jing LYU, Xirong Li^#.

arXiv preprint 2026 PDF →

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Jie Yang, Feipeng Ma, Zitian Wang, Dacheng Yin, Kang Rong, Fengyun Rao, Ruimao Zhang.

Tech Report 2025 PDF →

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang, Xiufeng Song, Heng Zhou, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin.

NeurIPS D&B 2025 PDF →

ED-Pose++: Enhanced Explicit Box Detection for Conventional and Interactive Multi-Object Keypoint Detection

Jie Yang, Ailing Zeng, Tianhe Ren, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang.

T-PAMI 2025 IEEE →

InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing

Jinlu Zhang, Yixin Chen, Zan Wang, Jie Yang, Yizhou Wang, Siyuan Huang.

CVPR 2025 PDF →

Unlock the Power of Unlabeled Data in Language Driving Model

Chaoqun Wang, Jie Yang, Xiaobin Hong, Ruimao Zhang.

ICRA 2025 PDF →

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Jie Yang, Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Ruimao Zhang.

NeurIPS 2024 PDF →

X-Pose: Detecting Any Keypoints

Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang.

ECCV 2024 PDF →

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang.

ECCV 2024 PDF →

Open-World Human-Object Interaction Detection via Multi-modal Prompts

Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang.

CVPR 2024 PDF →

Neural Interactive Keypoint Detection

Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang.

ICCV 2023 PDF →

Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains

Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang.

CVPR 2023 PDF →

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang.

ICLR 2023 PDF →

Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency

Jie Yang, Ye Zhu, Chaoqun Wang, Zhen Li, Ruimao Zhang.

MIDL 2023 PDF →

Academic Services

Conference & Journal Reviewer

Serving as a reviewer for top-tier venues in computer vision and machine learning:

CVPR ICLR ICML NeurIPS ECCV ICCV T-PAMI TNNLS TMM

Challenge Organizer

Co-organizer of the MICCAI AMOS Segmentation Challenge 2022, a large-scale benchmark on multi-organ abdominal segmentation across CT and MRI modalities.

Honors & Awards

🏆The First Prize Scholarship · 2018, 2019, 2020
🏆National Scholarship · 2018