Skip to content

Commit 7ad5b85

Browse files
committed
docs(env): clean dataprocess env with only dataset api.
* remove evalai & open3d etc. * fix #9 * visualization add vis raw point cloud.
1 parent 8ffb458 commit 7ad5b85

File tree

4 files changed

+27
-27
lines changed

4 files changed

+27
-27
lines changed

dataprocess/README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,47 +12,47 @@ We've updated the process dataset for:
1212
- [x] Waymo: check [here](#waymo-dataset). The process script was involved from [SeFlow](https://github.com/KTH-RPL/SeFlow).
1313
- [ ] nuScenes: done coding, public after review. Will be involved later by another paper.
1414

15-
If you want to use all datasets above, there is a specific process environment in [envprocess.yml](../envprocess.yml) to install all the necessary packages. As Waymo package have different configuration and conflict with the main environment. Setup through the following command:
15+
If you want to use all datasets above, there is a specific process environment in [envprocess.yaml](../envprocess.yaml) to install all the necessary packages. As Waymo package have different configuration and conflict with the main environment. Setup through the following command:
1616

1717
```bash
18-
conda env create -f envprocess.yml
18+
conda env create -f envprocess.yaml
1919
conda activate dataprocess
20+
# NOTE we need **manually reinstall numpy** (higher than 1.22)
21+
# * since waymo package force numpy==1.21.5, BUT!
22+
# * hdbscan w. numpy<1.22.0 will raise error: 'numpy.float64' object cannot be interpreted as an integer
23+
# * av2 need numpy >=1.22.0, waymo with numpy==1.22.0 will be fine on code running.
24+
pip install numpy==1.22
2025
```
2126

2227
## Download
2328

2429
### Argoverse 2.0
2530

26-
Install their download tool:
27-
```bash
28-
mamba install s5cmd -c conda-forge
29-
```
30-
31-
Download the dataset:
31+
Install their download tool `s5cmd`, already in our envprocess.yaml. Then download the dataset:
3232
```bash
3333
# train is really big (750): totally 966 GB
34-
s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/sensor/train/*" sensor/train
34+
s5cmd --numworkers 12 --no-sign-request cp "s3://argoverse/datasets/av2/sensor/train/*" av2/sensor/train
3535

3636
# val (150) and test (150): totally 168GB + 168GB
37-
s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/sensor/val/*" sensor/val
38-
s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/sensor/test/*" sensor/test
37+
s5cmd --numworkers 12 --no-sign-request cp "s3://argoverse/datasets/av2/sensor/val/*" av2/sensor/val
38+
s5cmd --numworkers 12 --no-sign-request cp "s3://argoverse/datasets/av2/sensor/test/*" av2/sensor/test
3939

4040
# for local and online eval mask from official repo
4141
s5cmd --no-sign-request cp "s3://argoverse/tasks/3d_scene_flow/zips/*" .
4242
```
4343

4444
Then to quickly pre-process the data, we can [read these commands](#process) on how to generate the pre-processed data for training and evaluation. This will take around 0.5-2 hour for the whole dataset (train & val) based on how powerful your CPU is.
4545

46-
More [self-supervised data in AV2 LiDAR only](https://www.argoverse.org/av2.html#lidar-link), note: It **does not** include **imagery or 3D annotations**. The dataset is designed to support research into self-supervised learning in the lidar domain, as well as point cloud forecasting.
46+
Optional: More [self-supervised data in AV2 LiDAR only](https://www.argoverse.org/av2.html#lidar-link), note: It **does not** include **imagery or 3D annotations**. The dataset is designed to support research into self-supervised learning in the lidar domain, as well as point cloud forecasting.
4747
```bash
4848
# train is really big (16000): totally 4 TB
49-
s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/lidar/train/*" lidar/train
49+
s5cmd --numworkers 12 --no-sign-request cp "s3://argoverse/datasets/av2/lidar/train/*" av2/lidar/train
5050

5151
# val (2000): totally 0.5 TB
52-
s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/lidar/val/*" lidar/val
52+
s5cmd --numworkers 12 --no-sign-request cp "s3://argoverse/datasets/av2/lidar/val/*" av2/lidar/val
5353

5454
# test (2000): totally 0.5 TB
55-
s5cmd --no-sign-request cp "s3://argoverse/datasets/av2/lidar/test/*" lidar/test
55+
s5cmd --numworkers 12 --no-sign-request cp "s3://argoverse/datasets/av2/lidar/test/*" av2/lidar/test
5656
```
5757

5858
#### Dataset frames

dataprocess/extract_waymo.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -346,7 +346,6 @@ def create_group_data(group, pc, pose, gm = None, flow_0to1=None, flow_valid=Non
346346
first_frame = dataset_pb2.Frame.FromString(bytearray(all_data[0]))
347347
scene_id = first_frame.context.name
348348
total_lens = len(all_data)
349-
# for data_idx in tqdm(range(1, total_lens), ncols=100):
350349
for data_idx in range(1, total_lens):
351350
if data_idx >= total_lens - 2:
352351
# 0: no correct flow label, end(total_lens - 1) - 1: no correct pose flow
@@ -384,7 +383,6 @@ def process_logs(data_dir: Path, map_dir: Path, output_dir: Path, nproc: int):
384383
data_dir: Argoverse 2.0 directory
385384
output_dir: Output directory.
386385
"""
387-
388386
if not (data_dir).exists():
389387
print(f'{data_dir} not found')
390388
return
@@ -408,7 +406,7 @@ def process_logs(data_dir: Path, map_dir: Path, output_dir: Path, nproc: int):
408406
def main(
409407
flow_data_dir: str = "/home/kin/data/waymo/flowlabel",
410408
mode: str = "test",
411-
map_dir: str = "/home/kin/data/waymo/flowlabel/maps",
409+
map_dir: str = "/home/kin/data/waymo/flowlabel/map",
412410
output_dir: str ="/home/kin/data/waymo/flowlabel/preprocess",
413411
nproc: int = (multiprocessing.cpu_count() - 1),
414412
create_index_only: bool = False,

envprocess.yaml

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,24 @@ dependencies:
66
- python=3.8
77
- pytorch::pytorch=2.0.0
88
- pytorch::torchvision
9+
- mkl==2024.0.0
910
- numba
10-
- numpy==1.22
11+
- numpy
1112
- pandas
1213
- pip
1314
- scipy
1415
- tqdm
15-
- scikit-learn
1616
- fire
17+
- hdbscan
18+
- s5cmd
1719
- pip:
1820
- nuscenes-devkit
1921
- av2==0.2.1
2022
- waymo-open-dataset-tf-2.11.0==1.5.0
21-
- open3d==0.18.0
23+
- dufomap==1.0.0
2224
- linefit
2325
- dztimer
24-
- dufomap==1.0.0
25-
- evalai
2626

2727
# Reason about the version fixed:
2828
# numpy==1.22: package conflicts, need numpy higher or same 1.22
29-
# open3d==0.18.0: because 0.17.0 have bug on set the view json file
30-
# dufomap==1.0.0: in case later updating may not compatible with the code.
29+
# mkl==2024.0.0: https://github.com/pytorch/pytorch/issues/123097

tools/visualization.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ def check_flow(
6262
def vis(
6363
data_dir: str ="/home/kin/data/av2/preprocess/sensor/mini",
6464
res_name: str = "flow", # "flow", "flow_est"
65-
start_id: int = -1,
65+
start_id: int = 0,
6666
point_size: float = 2.0,
6767
):
6868
dataset = HDF5Data(data_dir, vis_name=res_name, flow_view=True)
@@ -88,7 +88,10 @@ def vis(
8888
pose_flow = pc0[:, :3] @ ego_pose[:3, :3].T + ego_pose[:3, 3] - pc0[:, :3]
8989

9090
pcd = o3d.geometry.PointCloud()
91-
if res_name in ['dufo_label', 'label']:
91+
if res_name == 'raw': # no result, only show **raw point cloud**
92+
pcd.points = o3d.utility.Vector3dVector(pc0[:, :3])
93+
pcd.paint_uniform_color([1.0, 1.0, 1.0])
94+
elif res_name in ['dufo_label', 'label']:
9295
labels = data[res_name]
9396
pcd_i = o3d.geometry.PointCloud()
9497
for label_i in np.unique(labels):

0 commit comments

Comments
 (0)