Publications

2026

  1. LLM/VLM
    GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows
    Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, and 5 more authors
    arXiv, 2026
  2. Agent
    RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents
    Jize Wang, Han Wu, Zhiyuan You, Yiming Song, Yijun Wang, and 7 more authors
    ACL, 2026
  3. Agent
    TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
    Zerun Ma, Guoqiang Wang, Xinchen Xie, Yicheng Chen, He Du, and 5 more authors
    arXiv, 2026
  4. LLM/VLM
    Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
    He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, and 16 more authors
    arXiv, 2026
  5. LLM/VLM
    DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning
    Yicheng Chen, Zerun Ma, Xinchen Xie, Yining Li, and Kai Chen
    arXiv, 2026

2025

  1. LLM/VLM
    MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
    Yicheng Chen, Yining Li, Kai Hu, Zerun Ma, Haochen Ye, and 1 more author
    In Findings of ACL, 2025
  2. Vision & Multimodality
    Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
    Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, and 2 more authors
    In CVPR, 2025
  3. Vision & Multimodality
    MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
    Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, and 2 more authors
    IEEE TCSVT, 2025

2024

  1. LLM/VLM
    InternLM-Law: An Open Source Chinese Legal Large Language Model
    Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, and 7 more authors
    arXiv, 2024
  2. Vision & Multimodality
    MotionBooth: Motion-Aware Customized Text-to-Video Generation
    Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, and 3 more authors
    In NeurIPS Spotlight, 2024
  3. LLM/VLM
    Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
    Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, and 5 more authors
    In NeurIPS, 2024
  4. LLM/VLM
    InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
    Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yan Cao, Boxiao Wang, and 4 more authors
    In NeurIPS, 2024
  5. LLM/VLM
    InternLM2 Technical Report
    Zhaowei Cai, Ming Cao, Hao Chen, Kai Chen, Kaibo Chen, and 4 more authors
    arXiv, 2024
  6. LLM/VLM
    InternLM-XComposer2: Mastering Free-Form Text-Image Composition and Comprehension in Vision-Language Large Model
    Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yan Cao, Boxiao Wang, and 4 more authors
    arXiv, 2024
  7. Agent
    GTA: A Benchmark for General Tool Agents
    Jize Wang, Zerun Ma, Yining Li, Songyang Zhang, Cailian Chen, and 2 more authors
    In NeurIPS Datasets and Benchmarks Track, 2024
  8. Vision & Multimodality
    RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation
    Tao Jiang, Xinchen Xie, and Yining Li
    In CVPR, 2024
  9. Vision & Multimodality
    Open-vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
    Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, and 1 more author
    In ECCV, 2024
  10. Vision & Multimodality
    RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
    Peng Lu, Tao Jiang, Yining Li, Xiangtai Li, Kai Chen, and 1 more author
    In CVPR, 2024
  11. Vision & Multimodality
    OMG-Seg: Is One Model Good Enough for All Segmentation?
    Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, and 4 more authors
    In CVPR, 2024
  12. Vision & Multimodality
    Towards Language-Driven Video Inpainting via Multimodal Large Language Models
    Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, and 6 more authors
    In CVPR, 2024
  13. LLM/VLM
    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
    Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yan Cao, Rui Qian, and 4 more authors
    arXiv, 2024
  14. LLM/VLM
    MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
    Xinyu Fang, Kangrui Mao, Haodong Duan, Xiangyu Zhao, Yining Li, and 2 more authors
    In NeurIPS Datasets and Benchmarks Track, 2024
  15. Vision & Multimodality
    An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
    Xiangyu Zhao, Yicheng Chen, Shilin Xu, Xiangtai Li, Xinjiang Wang, and 2 more authors
    arXiv, 2024
  16. Vision & Multimodality
    RAP-SAM: Towards Real-Time All-Purpose Segment Anything
    Shilin Xu, Haobo Yuan, Qingyu Shi, Lu Qi, Jingbo Wang, and 7 more authors
    arXiv, 2024

2023

  1. Vision & Multimodality
    RTMPose: Real-Time Multi-Person Pose Estimation Based on MMPose
    Tao Jiang, Peng Lu, Li Zhang, Ningsheng Ma, Rui Han, and 3 more authors
    arXiv, 2023
  2. Vision & Multimodality
    DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection
    Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yining Li, and 4 more authors
    arXiv, 2023
  3. Agent
    AgentLego: Open-Source Tool API Library to Extend and Enhance LLM Agents
    AL Contributors
    2023

2020

  1. Vision & Multimodality
    OpenMMLab Pose Estimation Toolbox and Benchmark
    MMP Contributors
    GitHub, 2020

2019

  1. Vision & Multimodality
    Deep Imbalanced Learning for Face Recognition and Attribute Prediction
    Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang
    IEEE TPAMI, 2019
  2. Vision & Multimodality
    Dense Intrinsic Appearance Flow for Human Pose Transfer
    Yining Li, Chen Huang, and Chen Change Loy
    In CVPR, 2019

2017

  1. Vision & Multimodality
    Learning to Disambiguate by Asking Discriminative Questions
    Yining Li, Chen Huang, Xiaoou Tang, and Chen Change Loy
    In ICCV, 2017

2016

  1. Vision & Multimodality
    Learning Deep Representation for Imbalanced Classification
    Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang
    In CVPR Spotlight, 2016
  2. Vision & Multimodality
    Human Attribute Recognition by Deep Hierarchical Contexts
    Yining Li, Chen Huang, Chen Change Loy, and Xiaoou Tang
    In ECCV, 2016