Skip to content

Commit 93b308d

Browse files
authored
Merge pull request #248 from LouisVanLangendonck/graphuniverse-dataset
Category: A1; Team name: louisvl; Dataset: GraphUniverse
2 parents 4e25ff1 + 8990530 commit 93b308d

5 files changed

Lines changed: 174 additions & 0 deletions

File tree

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,13 +375,17 @@ Specially useful in pre-processing steps, these are the general data manipulatio
375375
| IMDB-BIN | Classification | Graph-level classification. | [Source](https://dl.acm.org/doi/10.1145/2783258.2783417) |
376376
| IMDB-MUL | Classification | Graph-level classification. | [Source](https://dl.acm.org/doi/10.1145/2783258.2783417) |
377377
| REDDIT | Classification | Graph-level classification. | [Source](https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf) |
378+
| GraphUniverse-IND | Classification | Synthetic Generator Inductive Node Classification. | [Source](https://openreview.net/forum?id=jRWxvQnqUt&noteId=jRWxvQnqUt) |
379+
| GraphUniverse-TRA | Classification | Synthetic Generator Transductive Node Classification. | [Source](https://openreview.net/forum?id=jRWxvQnqUt&noteId=jRWxvQnqUt) |
378380
| Amazon | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/1205.6233) |
379381
| Minesweeper | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/2302.11640) |
380382
| Empire | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/2302.11640) |
381383
| Tolokers | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/2302.11640) |
382384
| US-county-demos | Regression | In turn each node attribute is used as the target label. | [Source](https://arxiv.org/pdf/2002.08274) |
383385
| ZINC | Regression | Graph-level regression. | [Source](https://pubs.acs.org/doi/10.1021/ci3001277) |
384386

387+
**Remark:** GraphUniverse is a synthetic graph generator for community-structured data, enabling control over graph properties like homophily, feature-signal and degree structure. Live Demo: [Demo](https://graphuniverse.streamlit.app/). Package release: [PyPi](https://pypi.org/project/graph-universe/0.1.2/). GitHub repository: [Repo](https://github.com/LouisVanLangendonck/GraphUniverse).
388+
385389

386390
### Simplicial
387391
| Dataset | Task | Description | Reference |
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
2+
loader:
3+
_target_: topobench.data.loaders.GraphUniverseDatasetLoader
4+
parameters:
5+
data_domain: graph
6+
data_type: GraphUniverse
7+
data_name: GraphUniverse
8+
data_dir: ${paths.data_dir}/${dataset.loader.parameters.data_domain}/${dataset.loader.parameters.data_type}
9+
generation_parameters:
10+
task: community_detection
11+
universe_parameters:
12+
K: 20
13+
feature_dim: 15
14+
center_variance: 0.2
15+
cluster_variance: 0.4
16+
edge_propensity_variance: 1.0
17+
seed: 42
18+
family_parameters:
19+
n_graphs: 1000
20+
n_nodes_range: [50, 200]
21+
n_communities_range: [3, 7]
22+
homophily_range: [0.4, 0.8]
23+
avg_degree_range: [1.0, 2.0]
24+
degree_separation_range: [0.5, 1.0]
25+
power_law_exponent_range: [1.5, 2.5]
26+
seed: ${dataset.loader.parameters.generation_parameters.universe_parameters.seed}
27+
28+
29+
# Dataset parameters
30+
parameters:
31+
num_features: ${dataset.loader.parameters.generation_parameters.universe_parameters.feature_dim}
32+
num_classes: ${dataset.loader.parameters.generation_parameters.universe_parameters.K}
33+
task: classification
34+
loss_type: cross_entropy
35+
monitor_metric: accuracy
36+
task_level: node
37+
38+
#splits
39+
split_params:
40+
learning_setting: inductive
41+
data_split_dir: ${dataset.loader.parameters.data_dir}/data_splits
42+
data_seed: 0
43+
split_type: random #'k-fold' # either "k-fold" or "random" strategies
44+
k: 10 # for "k-fold" Cross-Validation
45+
train_prop: 0.7 # for "random" strategy splitting
46+
47+
# Dataloader parameters
48+
dataloader_params:
49+
batch_size: 16
50+
num_workers: 0
51+
pin_memory: False
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
2+
loader:
3+
_target_: topobench.data.loaders.GraphUniverseDatasetLoader
4+
parameters:
5+
data_domain: graph
6+
data_type: GraphUniverse
7+
data_name: GraphUniverse
8+
data_dir: ${paths.data_dir}/${dataset.loader.parameters.data_domain}/${dataset.loader.parameters.data_type}
9+
generation_parameters:
10+
task: community_detection
11+
universe_parameters:
12+
K: 10
13+
feature_dim: 15
14+
center_variance: 0.2
15+
cluster_variance: 0.5
16+
edge_propensity_variance: 0.5
17+
seed: 42
18+
family_parameters:
19+
n_graphs: 1
20+
n_nodes_range: [5000, 5000]
21+
n_communities_range: [10, 10]
22+
homophily_range: [0.5, 0.5]
23+
avg_degree_range: [2.5, 2.5]
24+
degree_separation_range: [0.5, 0.5]
25+
power_law_exponent_range: [2.5, 2.5]
26+
seed: ${dataset.loader.parameters.generation_parameters.universe_parameters.seed}
27+
28+
29+
# Dataset parameters
30+
parameters:
31+
num_features: ${dataset.loader.parameters.generation_parameters.universe_parameters.feature_dim}
32+
num_classes: ${dataset.loader.parameters.generation_parameters.universe_parameters.K}
33+
task: classification
34+
loss_type: cross_entropy
35+
monitor_metric: accuracy
36+
task_level: node
37+
38+
#splits
39+
split_params:
40+
learning_setting: transductive
41+
data_split_dir: ${dataset.loader.parameters.data_dir}/data_splits
42+
data_seed: 0
43+
split_type: random #'k-fold' # either "k-fold" or "random" strategies
44+
k: 10 # for "k-fold" Cross-Validation
45+
train_prop: 0.7 # for "random" strategy splitting
46+
47+
# Dataloader parameters
48+
dataloader_params:
49+
batch_size: 1
50+
num_workers: 0
51+
pin_memory: False

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ dependencies=[
5555
"rootutils",
5656
"topomodelx @ git+https://github.com/pyt-team/TopoModelX.git",
5757
"toponetx @ git+https://github.com/pyt-team/TopoNetX.git@c378925",
58+
"graph-universe==0.1.2",
5859
"lightning==2.4.0",
5960
"torch-scatter",
6061
"torch-sparse",
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
"""Loaders for GraphUniverse [1] datasets.
2+
3+
[1] "GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization" by Louis Van Langendonck and Guillermo Bernardez and Nina Miolane and Pere Barlet-Ros
4+
Accepted at The Fourteenth International Conference on Learning Representations, 2026},
5+
https://openreview.net/forum?id=jRWxvQnqUt
6+
"""
7+
8+
from graph_universe import GraphUniverseDataset
9+
from omegaconf import DictConfig
10+
from torch_geometric.data import Data, Dataset
11+
12+
from topobench.data.loaders.base import AbstractLoader
13+
14+
15+
class GraphUniverseDatasetLoader(AbstractLoader):
16+
"""Load Graph Universe datasets.
17+
18+
Parameters
19+
----------
20+
parameters : DictConfig
21+
Configuration parameters containing:
22+
- data_dir: Root directory for data
23+
- data_name: Name of the dataset
24+
- data_type: Type of the dataset (e.g., "graph_classification")
25+
"""
26+
27+
def __init__(self, parameters: DictConfig) -> None:
28+
super().__init__(parameters)
29+
30+
def load_dataset(self) -> Dataset:
31+
"""Load Graph Universe dataset.
32+
33+
Returns
34+
-------
35+
Dataset
36+
The loaded Graph Universe dataset.
37+
38+
Raises
39+
------
40+
RuntimeError
41+
If dataset loading fails.
42+
"""
43+
44+
dataset = GraphUniverseDataset(
45+
root=str(self.root_data_dir),
46+
parameters=self.parameters["generation_parameters"]
47+
)
48+
49+
return dataset
50+
51+
def load(self, **kwargs) -> tuple[Data, str]:
52+
"""Load data.
53+
54+
Parameters
55+
----------
56+
**kwargs : dict
57+
Additional keyword arguments.
58+
59+
Returns
60+
-------
61+
tuple[torch_geometric.data.Data, str]
62+
Tuple containing the loaded data and the data directory.
63+
"""
64+
dataset = self.load_dataset(**kwargs)
65+
data_dir = dataset.raw_dir
66+
67+
return dataset, data_dir

0 commit comments

Comments
 (0)