Skip to content

Commit 9cbf7ca

Browse files
committed
Add api documentation
1 parent f61efa5 commit 9cbf7ca

2 files changed

Lines changed: 218 additions & 0 deletions

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ To run inference on Human3.6M:
9292
sh ./scripts/FMPose3D_test.sh
9393
```
9494

95+
### Inference API
96+
97+
FMPose3D also ships a high-level Python API for end-to-end 3D pose estimation from images. See the [Inference API documentation](fmpose3d/inference_api/README.md) for the full reference.
98+
9599
## Experiments on non-human animals
96100

97101
For animal training/testing and demo scripts, see [animals/README.md](animals/README.md).

fmpose3d/inference_api/README.md

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
# FMPose3D Inference API
2+
3+
## Overview
4+
This inference API provides a high-level, end-to-end interface for monocular 3D pose estimation using flow matching. It wraps the full pipeline — input ingestion, 2D keypoint detection, and 3D lifting — behind a single `FMPose3DInference` class, supporting both **human** (17-joint H36M) and **animal** (26-joint Animal3D) skeletons. Model weights are downloaded automatically from HuggingFace when not provided locally.
5+
6+
---
7+
8+
9+
## Quick Examples
10+
11+
**Human pose estimation (end-to-end):**
12+
13+
```python
14+
from fmpose3d import FMPose3DInference, FMPose3DConfig
15+
16+
# Create a config (optional)
17+
config = FMPose3DConfig(model_type="fmpose3d_humans") # or "fmpose3d_animals"
18+
19+
# Initialize the API
20+
api = FMPose3DInference(config) # weights auto-downloaded
21+
22+
# Predict from source (path, or an image array)
23+
result = api.predict("photo.jpg")
24+
print(result.poses_3d.shape) # (1, 17, 3)
25+
print(result.poses_3d_world.shape) # (1, 17, 3)
26+
```
27+
28+
**Human pose estimation (two-step):**
29+
30+
```python
31+
from fmpose3d import FMPose3DInference
32+
33+
api = FMPose3DInference(model_weights_path="weights.pth")
34+
35+
# The 2D and 3D inference step can be called separately
36+
result_2d = api.prepare_2d("photo.jpg")
37+
result_3d = api.pose_3d(result_2d.keypoints, result_2d.image_size)
38+
```
39+
40+
**Animal pose estimation:**
41+
42+
```python
43+
from fmpose3d import FMPose3DInference
44+
45+
# The api has a convenience method for loading directly with the animal config
46+
api = FMPose3DInference.for_animals()
47+
result = api.predict("dog.jpg")
48+
print(result.poses_3d.shape) # (1, 26, 3)
49+
```
50+
51+
52+
## API Documentation
53+
54+
### `FMPose3DInference` — Main Inference Class
55+
56+
The high-level entry point. Manages the full pipeline: input ingestion, 2D estimation, and 3D lifting.
57+
58+
#### Constructor
59+
60+
```python
61+
FMPose3DInference(
62+
model_cfg: FMPose3DConfig | None = None,
63+
inference_cfg: InferenceConfig | None = None,
64+
model_weights_path: str | Path | None = None,
65+
device: str | torch.device | None = None,
66+
*,
67+
estimator_2d: HRNetEstimator | SuperAnimalEstimator | None = None,
68+
postprocessor: HumanPostProcessor | AnimalPostProcessor | None = None,
69+
)
70+
```
71+
72+
| Parameter | Description |
73+
|---|---|
74+
| `model_cfg` | Model architecture settings. Defaults to human (17 H36M joints). |
75+
| `inference_cfg` | Inference settings (sample steps, test augmentation, etc.). |
76+
| `model_weights_path` | Path to a `.pth` checkpoint. `None` triggers automatic download from HuggingFace. |
77+
| `device` | Compute device. `None` auto-selects CUDA if available. |
78+
| `estimator_2d` | Override the 2D pose estimator (auto-selected by default). |
79+
| `postprocessor` | Override the post-processor (auto-selected by default). |
80+
81+
#### `FMPose3DInference.for_animals(...)` — Class Method
82+
83+
```python
84+
@classmethod
85+
def for_animals(
86+
cls,
87+
model_weights_path: str | None = None,
88+
*,
89+
device: str | torch.device | None = None,
90+
inference_cfg: InferenceConfig | None = None,
91+
) -> FMPose3DInference
92+
```
93+
94+
Convenience constructor for the **animal** pipeline. Sets `model_type="fmpose3d_animals"`, loads the appropriate config (26-joint Animal3D skeleton) and disables flip augmentation by default.
95+
96+
---
97+
98+
### Public Methods
99+
100+
#### `predict(source, *, camera_rotation, seed, progress)` → `Pose3DResult`
101+
102+
End-to-end prediction: 2D estimation followed by 3D lifting in a single call.
103+
104+
| Parameter | Type | Description |
105+
|---|---|---|
106+
| `source` | `Source` | Image path, directory, numpy array `(H,W,C)` or `(N,H,W,C)`, or list thereof. Video files are not supported. |
107+
| `camera_rotation` | `ndarray \| None` | Length-4 quaternion for camera-to-world rotation. Defaults to the official demo rotation. `None` skips the transform. Ignored for animals. |
108+
| `seed` | `int \| None` | Seed for reproducible sampling. |
109+
| `progress` | `ProgressCallback \| None` | Callback `(current_step, total_steps) -> None`. |
110+
111+
**Returns:** `Pose3DResult`
112+
113+
---
114+
115+
#### `prepare_2d(source, progress)` → `Pose2DResult`
116+
117+
Runs only the 2D pose estimation step.
118+
119+
| Parameter | Type | Description |
120+
|---|---|---|
121+
| `source` | `Source` | Same flexible input as `predict()`. |
122+
| `progress` | `ProgressCallback \| None` | Optional progress callback. |
123+
124+
**Returns:** `Pose2DResult` containing `keypoints`, `scores`, and `image_size`.
125+
126+
---
127+
128+
#### `pose_3d(keypoints_2d, image_size, *, camera_rotation, seed, progress)` → `Pose3DResult`
129+
130+
Lifts pre-computed 2D keypoints to 3D using the flow-matching model.
131+
132+
| Parameter | Type | Description |
133+
|---|---|---|
134+
| `keypoints_2d` | `ndarray` | Shape `(num_persons, num_frames, J, 2)` or `(num_frames, J, 2)`. First person is used if 4D. |
135+
| `image_size` | `tuple[int, int]` | `(height, width)` of the source frames. |
136+
| `camera_rotation` | `ndarray \| None` | Camera-to-world quaternion (human only). |
137+
| `seed` | `int \| None` | Seed for reproducible results. |
138+
| `progress` | `ProgressCallback \| None` | Per-frame progress callback. |
139+
140+
**Returns:** `Pose3DResult`
141+
142+
---
143+
144+
#### `setup_runtime()`
145+
146+
Manually initializes all runtime components (2D estimator, 3D model, weights). Called automatically on first use of `predict`, `prepare_2d`, or `pose_3d`.
147+
148+
---
149+
150+
### Types & Data Classes
151+
152+
### `Source`
153+
154+
Accepted source types for `FMPose3DInference.predict` and `prepare_2d`:
155+
156+
- `str` or `Path` — path to an image file or a directory of images.
157+
- `np.ndarray` — a single frame `(H, W, C)` or a batch `(N, H, W, C)`.
158+
- `list` — a list of file paths or a list of `(H, W, C)` arrays.
159+
160+
```python
161+
Source = Union[str, Path, np.ndarray, Sequence[Union[str, Path, np.ndarray]]]
162+
```
163+
164+
#### `Pose2DResult`
165+
166+
| Field | Type | Description |
167+
|---|---|---|
168+
| `keypoints` | `ndarray` | 2D keypoints, shape `(num_persons, num_frames, J, 2)`. |
169+
| `scores` | `ndarray` | Per-joint confidence, shape `(num_persons, num_frames, J)`. |
170+
| `image_size` | `tuple[int, int]` | `(height, width)` of source frames. |
171+
172+
#### `Pose3DResult`
173+
174+
| Field | Type | Description |
175+
|---|---|---|
176+
| `poses_3d` | `ndarray` | Root-relative 3D poses, shape `(num_frames, J, 3)`. |
177+
| `poses_3d_world` | `ndarray` | Post-processed 3D poses, shape `(num_frames, J, 3)`. For humans: world-coordinate poses. For animals: limb-regularized poses. |
178+
179+
180+
181+
---
182+
183+
### 2D Estimators
184+
185+
#### `HRNetEstimator(cfg: HRNetConfig | None)`
186+
187+
Default 2D estimator for the human pipeline. Wraps HRNet + YOLO with a COCOH36M keypoint conversion.
188+
189+
- `setup_runtime()` — Loads YOLO + HRNet models.
190+
- `predict(frames: ndarray)``(keypoints, scores)` — Returns H36M-format 2D keypoints from BGR frames `(N, H, W, C)`.
191+
192+
#### `SuperAnimalEstimator(cfg: SuperAnimalConfig | None)`
193+
194+
2D estimator for the animal pipeline. Uses DeepLabCut SuperAnimal and maps quadruped80K keypoints to the 26-joint Animal3D layout.
195+
196+
- `setup_runtime()` — No-op (DLC loads lazily).
197+
- `predict(frames: ndarray)``(keypoints, scores)` — Returns Animal3D-format 2D keypoints from BGR frames.
198+
199+
---
200+
201+
### Post-Processors
202+
203+
#### `HumanPostProcessor`
204+
205+
Zeros the root joint (root-relative) and applies `camera_to_world` rotation.
206+
207+
#### `AnimalPostProcessor`
208+
209+
Applies limb regularization (rotates the pose so that average limb direction is vertical). No root zeroing or camera-to-world transform.
210+
211+
---
212+
213+
214+

0 commit comments

Comments
 (0)