PX Pose is the first half of the pipeline and estimates bimanual poses from each DF-1 file. The modules can also output object poses, but this requires additional preparation steps to be completed beforehand.
docker start px_pipeline
docker exec -it px_pipeline bashRun the single script pose_main.sh under /home/app/pipeline
cd pipeline
bash pose_main.sh <hdf5_file_path>hdf5_file_pathis where you put the input DF-1 files. Do not put other HDF5 files in this directory, as it could cause confusion.- We recommend dividing DF-1 files into batches, so that processing each
h5_file_pathwill not take excessive time.
The pipeline can work with or without object poses. However, including them will increase the accuracy of retargeting. In this section, we will introduce how to prepare inputs for object pose estimation.
To estimate object poses, you must provide first-frame object masks (see the data sample for reference). Each mask image can be extracted from one of the three RGBD cameras: 0, 1, or 2. The perspectives for bimanual poses and object poses should not coincide.
After segmentation, reorganize and place all mask images from the same batch under a single directory, referred to as mask_frame_path. It should contain exactly one XML file named annotations.xml, whose structure can also be found in the data sample.
Masks in the annotation file are compressed using run-length encoding. The objects' names and IDs are also stored as text labels in the XML.
The program will not output object poses if it fails to detect either the XML file or the corresponding object mesh files (OBJ models). For convenience, some OBJ models are already included inside the Docker image under /home/app/pipeline/objects
# [HOST]
docker start px_pose
docker exec -it px_pose bash
# [CONTAINER]
cd /home/app/pipeline
bash pose_main.sh <h5_file_path> <mask_frame_path>
# Example: bash pose_main.sh /ws/df_1/hdf5 /ws/df_1/xml
# Do NOT run: bash pose_main.sh /ws/df_1/hdf5 /ws/df_1/xml/annotations.xmlYou can adjust the parameters in the Custom Parameter section of the main script to optimize runtime performance.
# -------- Custom Parameter --------- #
CAM_HAND=0 # "0", "1", [or] "2"; for bracelet pose
CAM_OBJ=2 # "0", "1", [or] "2"; for object pose
REMOVE=true # "true" to remove intermediate files after processing
MAX_PARA_POSE=3 # max number of parallel pose estimations
GPU_IDS="0" # visible CUDA device(s)/GPU(s)
# ---- Values Are Case Sensitive ---- #Increasing the value of MAX_PARA_POSE allow the script to process multiple pose estimations in parallel (default: 3). However, note that higher values will demand more GPU resources — particularly VRAM — so you should tune this parameter based on your GPU’s capability.
The single-digit GPU_IDS can be used to control which GPU is visible. It is a string inside the script.
If you are running on a server with multiple GPUs, make sure you have set it correctly to exploit the full efficiency.
An alternative way is to always set GPU_IDS to 0, and have each container bound to one GPU (e.g. via Docker Compose).
Each room of the Super EID Factory has 3 RGBD cameras (0, 1, and 2), from which we estimate and track the 6D poses.
CAM_HAND and CAM_OBJ select the RGBD cameras used for hands and objects respectively. Usually 0 produces the best results for hands, and 1 or 2 for objects.
Notice: Do not use the same RGBD camera for both hands and objects, i.e. CAM_HAND and CAM_OBJ must be different.
The poses generated by PX Pose are located in the vis_res folder under h5_file_path
vis_res/
├── episode_XXX_XXXXXX_XX_XXXXXX
│ └── RGBD_0
│ ├── aligned_left_bracelet_pose_results
│ ├── aligned_obj1_pose_results_RGBD_1
│ ├── aligned_obj2_pose_results_RGBD_1
... ...
│ ├── aligned_right_bracelet_pose_results
│ ├── raw_left_bracelet_poses
│ ├── raw_obj1_poses
│ ├── raw_obj2_poses
... ...
│ └── raw_right_bracelet_poses
...
├── episode_YYY_YYYYYY_YY_YYYYYY
...
└── episode_ZZZ_ZZZZZZ_ZZ_ZZZZZZ
└──...
Poses (stored as TXT files) are in the subfolders with aligned prefix and pose_results suffix. Each TXT file contains a homogeneous transformation matrix representing the 6D pose of the bracelet/object in a single frame. The vis_res folder is also the default input for the subsequent PX Post-Process part of the pipeline.