Skip to content

Latest commit

 

History

History
96 lines (78 loc) · 5.06 KB

File metadata and controls

96 lines (78 loc) · 5.06 KB

The First Part - Post Estimation

PX Pose is the first half of the pipeline and estimates bimanual poses from each DF-1 file. The modules can also output object poses, but this requires additional preparation steps to be completed beforehand.

Enter the container

docker start px_pipeline
docker exec -it px_pipeline bash

Hands only

Run the single script pose_main.sh under /home/app/pipeline

cd pipeline
bash pose_main.sh <hdf5_file_path>
  • hdf5_file_path is where you put the input DF-1 files. Do not put other HDF5 files in this directory, as it could cause confusion.
  • We recommend dividing DF-1 files into batches, so that processing each h5_file_path will not take excessive time.

Hands and objects

The pipeline can work with or without object poses. However, including them will increase the accuracy of retargeting. In this section, we will introduce how to prepare inputs for object pose estimation.

Preparing Object Masks

To estimate object poses, you must provide first-frame object masks (see the data sample for reference). Each mask image can be extracted from one of the three RGBD cameras: 0, 1, or 2. The perspectives for bimanual poses and object poses should not coincide.

After segmentation, reorganize and place all mask images from the same batch under a single directory, referred to as mask_frame_path. It should contain exactly one XML file named annotations.xml, whose structure can also be found in the data sample.

Masks in the annotation file are compressed using run-length encoding. The objects' names and IDs are also stored as text labels in the XML.

The program will not output object poses if it fails to detect either the XML file or the corresponding object mesh files (OBJ models). For convenience, some OBJ models are already included inside the Docker image under /home/app/pipeline/objects

Running the script

# [HOST]
docker start px_pose
docker exec -it px_pose bash

# [CONTAINER]
cd /home/app/pipeline
bash pose_main.sh <h5_file_path> <mask_frame_path>

# Example: bash pose_main.sh /ws/df_1/hdf5 /ws/df_1/xml
# Do NOT run: bash pose_main.sh /ws/df_1/hdf5 /ws/df_1/xml/annotations.xml

Custom Parameters

You can adjust the parameters in the Custom Parameter section of the main script to optimize runtime performance.

# -------- Custom Parameter --------- #
CAM_HAND=0                            # "0", "1", [or] "2"; for bracelet pose
CAM_OBJ=2                             # "0", "1", [or] "2"; for object pose
REMOVE=true                           # "true" to remove intermediate files after processing
MAX_PARA_POSE=3                       # max number of parallel pose estimations
GPU_IDS="0"                           # visible CUDA device(s)/GPU(s)
# ---- Values Are Case Sensitive ---- #

Parallel tasks

Increasing the value of MAX_PARA_POSE allow the script to process multiple pose estimations in parallel (default: 3). However, note that higher values will demand more GPU resources — particularly VRAM — so you should tune this parameter based on your GPU’s capability.

Multiple GPUs

The single-digit GPU_IDS can be used to control which GPU is visible. It is a string inside the script.
If you are running on a server with multiple GPUs, make sure you have set it correctly to exploit the full efficiency.

An alternative way is to always set GPU_IDS to 0, and have each container bound to one GPU (e.g. via Docker Compose).

RGBD camera index

Each room of the Super EID Factory has 3 RGBD cameras (0, 1, and 2), from which we estimate and track the 6D poses.
CAM_HAND and CAM_OBJ select the RGBD cameras used for hands and objects respectively. Usually 0 produces the best results for hands, and 1 or 2 for objects.
Notice: Do not use the same RGBD camera for both hands and objects, i.e. CAM_HAND and CAM_OBJ must be different.

Output: pose results

The poses generated by PX Pose are located in the vis_res folder under h5_file_path

vis_res/
├── episode_XXX_XXXXXX_XX_XXXXXX
│   └── RGBD_0
│       ├── aligned_left_bracelet_pose_results
│       ├── aligned_obj1_pose_results_RGBD_1
│       ├── aligned_obj2_pose_results_RGBD_1
...     ...
│       ├── aligned_right_bracelet_pose_results
│       ├── raw_left_bracelet_poses
│       ├── raw_obj1_poses
│       ├── raw_obj2_poses
...     ...
│       └── raw_right_bracelet_poses
...
├── episode_YYY_YYYYYY_YY_YYYYYY
...
└── episode_ZZZ_ZZZZZZ_ZZ_ZZZZZZ
    └──...

Poses (stored as TXT files) are in the subfolders with aligned prefix and pose_results suffix. Each TXT file contains a homogeneous transformation matrix representing the 6D pose of the bracelet/object in a single frame. The vis_res folder is also the default input for the subsequent PX Post-Process part of the pipeline.