PHD Logo

PHD
Personalized 3D Human Body Fitting with Point Diffusion

ICCV 2025

(Equal Advisory)
ETH Zürich Logo 1ETH Zürich
AI center Logo 2ETH AI Center
Meta Logo 3Meta
Paper Code Video

TL;DR

Abstract

We introduce PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information to improve pose estimation accuracy from videos. Traditional HMR methods are designed to be user-agnostic and optimized for generalization. While these methods often refine poses using constraints derived from the 2D image to improve alignment, this process compromises 3D accuracy by failing to jointly account for person-specific body shapes and the plausibility of 3D poses. In contrast, our pipeline decouples this process by first calibrating the user's body shape and then employing a personalized pose fitting process conditioned on that shape. To achieve this, we develop a body shape-conditioned 3D pose prior, implemented as a Point Diffusion Transformer, which iteratively guides the pose fitting via a Point Distillation Sampling loss. This learned 3D pose prior effectively mitigates errors arising from an over-reliance on 2D constraints. Consequently, our approach improves not only pelvis-aligned pose accuracy but also absolute pose accuracy -- an important metric often overlooked by prior work. Furthermore, our method is highly data-efficient, requiring only synthetic data for training, and serves as a versatile plug-and-play module that can be seamlessly integrated with existing 3D pose estimators to enhance their performance.

Method

SHAPify

We estimate a person's 3D body shape from a single image in a reference pose. Our method works by minimizing the 2D keypoint reprojection error, given a known camera focal length. To resolve the ambiguity of the unknown camera pitch angle, we regularize the optimization with minimal personal measurements, ensuring an accurate and optimal body shape.

Method Diagram

SHAPify estimate a person's 3D body shape from a single image. The problem is ill-posed because the camera pitch angle is unknown. We resolve this by using minimal body height and weight as regularization in the optimization process.

Shape-Conditioned Point Diffusion Transformer

To facilitate personalized 3D pose fitting, we propose PointDiT, a novel point diffusion transformer that generates 3D human poses conditioned on both the input image and the individual's body shape. PointDiT employs a rectified flow formulation to iteratively denoise random point clouds in as few as 5 denoising steps. This approach enables the generation of plausible 3D poses that align well with the observed 2D data while respecting the unique body shape of the individual.

Method Diagram

PointDiT is a novel point diffusion transformer leveraging the rectified flow formulation. It samples 3D body surface points by iteratively denoising random point clouds, conditioned on the image tokens and individual's body shape.

Point Distilled Body Fitting

The learned PointDiT model servers as a powerful 3D pose prior for guiding the body fitting and mitigating errors from over-reliance on 2D constraints. Inspired by Score Distillation Sampling, we introduce a Point Distillation Sampling loss that leverages PointDiT to refine 3D pose fitting.

Method Diagram

The Point Distillation Sampling loss guides 3D pose fitting leveraging the capabilities of PointDiT, ensuring the incorporation of personal body shape information and preventing overfitting to 2D keypoints.

Results

3D Pose Fitting Accuracy

We compare our approach to ScoreHMR, a recent diffusion-based pose fitting method which also leverages a body prior but still relies heavily on regressors for initialization.

Result 2

Comparisons of sampling poses and body fitting with ScoreHMR. Our method matches both 2D images and 3D ground truth (grey color meshes) better

While the 2D reprojections of ScoreHMR appear reasonable, it struggles to produce consistent and accurate shapes over time. In 3D view, ScoreHMR also fails to predict correct and stable 3D poses.

References

Citation

If you find our work useful, please consider citing:

@inproceedings{ho2025phd, title={PHD: Personalized 3D Human Body Fitting with Point Diffusion}, author={Ho, Hsuan-I and Guo, Chen and Wu, Po-Chen and Shugurov, Ivan and Tang, Chengcheng and Mittal, Abhay and An, Sizhe and Kaufmann, Manuel and Zhang, Linguang}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2025} }