MMPose
OpenMMLab Pose Estimation Toolbox and Benchmark
MMPose is an open-source pose estimation toolbox built on PyTorch and a member of the OpenMMLab project. It provides a unified codebase covering 2D / 3D body, hand, face, whole-body, and animal pose, with a fully modular design, production-ready deployment paths, and an extensive collection of pretrained models. As of the 1.3 release, MMPose ships 18+ algorithms, 9+ training/inference techniques, and benchmarks on 35+ datasets.
I was a core member of OpenMMLab and led the development of MMPose — from the initial public release through the 1.x series.
Highlights
- Comprehensive task coverage. Top-down and bottom-up 2D pose, 3D body / hand / whole-body, fashion landmark detection, animal pose, and video pose tracking — all in one toolbox.
- State-of-the-art models. From classics (HRNet, SimpleBaseline, Hourglass) to modern representations (SimCC, DSNT, RLE) and the real-time RTMPose series featured below.
- Production-grade deployment. First-class support via MMDeploy for ONNX, TensorRT, ncnn, OpenVINO, CoreML, and TorchScript, including mobile / edge backends.
- Reproducible research. Unified configs, model zoo, training recipes, and detailed evaluation scripts to reproduce published numbers.
The RTMPose Series
The RTMPose family is a line of real-time pose estimators we developed inside MMPose to close the long-standing gap between accuracy and deployment speed. The series now includes three models — RTMPose for 2D body pose, RTMO for one-stage multi-person pose, and RTMW for real-time whole-body pose.
RTMPose — Real-Time 2D Multi-Person Pose Estimation
Code · Paper (arXiv:2303.07399)
RTMPose is a top-down pose estimator with a SimCC-style coordinate-classification head, optimized end-to-end for both accuracy and on-device latency. Headline numbers from the paper:
- RTMPose-m: 75.8% AP on COCO with 90+ FPS on an Intel i7-11700 CPU and 430+ FPS on an NVIDIA GTX 1660 Ti GPU.
- RTMPose-s: 72.2% AP with 70+ FPS on a Snapdragon 865 mobile chip.
- Variants: t / s / m / l / x, in 256×192 and 384×288 input resolutions.
- Deployment: ONNX, TensorRT, ncnn, OpenVINO, CoreML — all bundled via MMDeploy.
RTMO — One-Stage Real-Time Multi-Person Pose Estimation
Code · Paper (arXiv:2312.07526)
Where RTMPose is top-down (detect → crop → pose), RTMO is one-stage — it directly regresses keypoints from the full image, dramatically simplifying the pipeline at competitive accuracy.
- RTMO-l: 74.8% AP on COCO val2017 at 141 FPS on a single V100, eliminating the detector + per-person crop overhead.
- Strong robustness on crowded scenes (CrowdPose).
- Variants: t / s / m / l, trained on COCO and the broader body7 mixture.
RTMW — Real-Time Whole-Body Pose Estimation
Code · Paper (arXiv:2407.08634)
RTMW (Real-Time Multi-person Whole-body) closes the same speed/accuracy gap for whole-body pose, which is critical for downstream tasks like motion capture, AIGC, and sign-language understanding.
- RTMW-x at 384×288: 70.2% Whole AP / 78.1% Whole AR on COCO-WholeBody.
- RTMW-l also reaches 70.1% Whole AP, with mobile-friendly variants in the m / l / x family.
- Trained on Cocktail14, a curated mixture of 14 public whole-body datasets covering a much broader appearance distribution than COCO-WholeBody alone.
Resources
- Code: github.com/open-mmlab/mmpose
- Documentation: mmpose.readthedocs.io
- Demos: mmpose.readthedocs.io/en/latest/demos.html
- Model zoo: mmpose.readthedocs.io/en/latest/model_zoo.html