GraphNeuralNetworks.jl/GNNLux/docs/src_tutorials/graph_classification.jl at 611b866c00eda3398830be59cc25b1efa8fff821 · juanmigutierrez/GraphNeuralNetworks.jl · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# # Graph Classification with Graph Neural Networks

# *This tutorial is a Julia adaptation of the Pytorch Geometric tutorial that can be found [here](https://pytorch-geometric.readthedocs.io/en/latest/notes/colabs.html).*

# In this tutorial session we will have a closer look at how to apply **Graph Neural Networks (GNNs) to the task of graph classification**.
# Graph classification refers to the problem of classifying entire graphs (in contrast to nodes), given a **dataset of graphs**, based on some structural graph properties and possibly on some input node features.
# Here, we want to embed entire graphs, and we want to embed those graphs in such a way so that they are linearly separable given a task at hand.
# We will use a graph convolutional network to create a vector embedding of the input graph, and the apply a simple linear classification head to perform the final classification.


# A common graph classification task is **molecular property prediction**, in which molecules are represented as graphs, and the task may be to infer whether a molecule inhibits HIV virus replication or not.

# The TU Dortmund University has collected a wide range of different graph classification datasets, known as the [**TUDatasets**](https://chrsmrrs.github.io/datasets/), which are also accessible via MLDatasets.jl.
# Let's import the necessary packages. Then we'll load and inspect one of the smaller ones, the **MUTAG dataset**:

using Lux
using GNNLux
using MLDatasets
using MLUtils
using LinearAlgebra, Random, Statistics
using Zygote, Optimisers, OneHotArrays


struct GlobalPool{F} <: GNNLayer
    aggr::F
end

(l::GlobalPool)(g::GNNGraph, x::AbstractArray, ps, st) = GNNlib.global_pool(l, g, x), st

(l::GlobalPool)(g::GNNGraph) = GNNGraph(g, gdata = l(g, node_features(g), ps, st))

ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"  # don't ask for dataset download confirmation
rng = Random.seed!(42); # for reproducibility

dataset = TUDataset("MUTAG")
#
dataset.graph_data.targets |> union
#
g1, y1 = dataset[1] # get the first graph and target
#
reduce(vcat, g.node_data.targets for (g, _) in dataset) |> union
#

reduce(vcat, g.edge_data.targets for (g, _) in dataset) |> union

# This dataset provides **188 different graphs**, and the task is to classify each graph into **one out of two classes**.

# By inspecting the first graph object of the dataset, we can see that it comes with **17 nodes** and **38 edges**.
# It also comes with exactly **one graph label**, and provides additional node labels (7 classes) and edge labels (4 classes).
# However, for the sake of simplicity, we will not make use of edge labels.

# We now convert the `MLDatasets.jl` graph types to our `GNNGraph`s and we also onehot encode both the node labels (which will be used as input features) and the graph labels (what we want to predict):

graphs = mldataset2gnngraph(dataset)
graphs = [GNNGraph(g,
                    ndata = Float32.(onehotbatch(g.ndata.targets, 0:6)),
                    edata = nothing)
            for g in graphs]
y = onehotbatch(dataset.graph_data.targets, [-1, 1])


# We have some useful utilities for working with graph datasets, *e.g.*, we can shuffle the dataset and use the first 150 graphs as training graphs, while using the remaining ones for testing:

train_data, test_data = splitobs((graphs, y), at = 150, shuffle = true) |> getobs


train_loader = DataLoader(train_data, batchsize = 32, shuffle = true)
test_loader = DataLoader(test_data, batchsize = 32, shuffle = false)


# Here, we opt for a `batch_size` of 32, leading to 5 (randomly shuffled) mini-batches, containing all $4 \cdot 32+22 = 150$ graphs.


# ## Mini-batching of graphs

# Since graphs in graph classification datasets are usually small, a good idea is to **batch the graphs** before inputting them into a Graph Neural Network to guarantee full GPU utilization.
# In the image or language domain, this procedure is typically achieved by **rescaling** or **padding** each example into a set of equally-sized shapes, and examples are then grouped in an additional dimension.
# The length of this dimension is then equal to the number of examples grouped in a mini-batch and is typically referred to as the `batchsize`.

# However, for GNNs the two approaches described above are either not feasible or may result in a lot of unnecessary memory consumption.
# Therefore, GNNLux.jl opts for another approach to achieve parallelization across a number of examples. Here, adjacency matrices are stacked in a diagonal fashion (creating a giant graph that holds multiple isolated subgraphs), and node and target features are simply concatenated in the node dimension (the last dimension).

# This procedure has some crucial advantages over other batching procedures:

# 1. GNN operators that rely on a message passing scheme do not need to be modified since messages are not exchanged between two nodes that belong to different graphs.

# 2. There is no computational or memory overhead since adjacency matrices are saved in a sparse fashion holding only non-zero entries, *i.e.*, the edges.

# GNNLux.jl can **batch multiple graphs into a single giant graph**:

vec_gs, _ = first(train_loader)
#
MLUtils.batch(vec_gs)

# Each batched graph object is equipped with a **`graph_indicator` vector**, which maps each node to its respective graph in the batch:

# ```math
# \textrm{graph\_indicator} = [1, \ldots, 1, 2, \ldots, 2, 3, \ldots ]
# ```

# ## Training a Graph Neural Network (GNN)

# Training a GNN for graph classification usually follows a simple recipe:

# 1. Embed each node by performing multiple rounds of message passing
# 2. Aggregate node embeddings into a unified graph embedding (**readout layer**)
# 3. Train a final classifier on the graph embedding

# There exists multiple **readout layers** in literature, but the most common one is to simply take the average of node embeddings:

# ```math
# \mathbf{x}_{\mathcal{G}} = \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \mathcal{x}^{(L)}_v
# ```

# GNNLux.jl provides this functionality via `GlobalPool(mean)`, which takes in the node embeddings of all nodes in the mini-batch and the assignment vector `graph_indicator` to compute a graph embedding of size `[hidden_channels, batchsize]`.

# The final architecture for applying GNNs to the task of graph classification then looks as follows and allows for complete end-to-end training:

# Then use it in the model
function create_model_graphconv(nin, nh, nout)
    GNNChain(GraphConv(nin => nh, relu),
             GraphConv(nh => nh, relu),
             GraphConv(nh => nh),
             GlobalPool(mean),
             Dropout(0.5),
             Dense(nh, nout))
end

nin = 7
nh = 64
nout = 2
model = create_model_graphconv(nin, nh, nout)

ps, st = LuxCore.initialparameters(rng, model), LuxCore.initialstates(rng, model);

# Here, we again make use of the `GCNConv` with $\mathrm{ReLU}(x) = \max(x, 0)$ activation for obtaining localized node embeddings, before we apply our final classifier on top of a graph readout layer.

# Let's train our network for a few epochs to see how well it performs on the training as well as test set:

function custom_loss(model, ps, st, tuple)
    g, x, y = tuple
    logitcrossentropy = CrossEntropyLoss(; logits=Val(true))
    st = Lux.trainmode(st)
    ŷ, st = model(g, x, ps, st)
    return  logitcrossentropy(ŷ, y), (; layers = st), 0
end

function eval_loss_accuracy(model, ps, st, data_loader)
    loss = 0.0
    acc = 0.0
    ntot = 0
    logitcrossentropy = CrossEntropyLoss(; logits=Val(true))
    for (g, y) in data_loader
        g = MLUtils.batch(g)
        n = length(y)
        ŷ, _ = model(g, g.ndata.x, ps, st)
        loss += logitcrossentropy(ŷ, y) * n
        acc += mean((ŷ .> 0) .== y) * n
        ntot += n
    end
    return (loss = round(loss / ntot, digits = 4),
            acc = round(acc * 100 / ntot, digits = 2))
end

function train_model!(model, ps, st; epochs = 500, infotime = 100)
    train_state = Lux.Training.TrainState(model, ps, st, Adam(1e-2))

    function report(epoch)
        train = eval_loss_accuracy(model, ps, st, train_loader)
        st = Lux.testmode(st)
        test = eval_loss_accuracy(model, ps, st, test_loader)
        st = Lux.trainmode(st)
        @info (; epoch, train, test)
    end
    report(0)
    for iter in 1:epochs
        for (g, y) in train_loader
            g = MLUtils.batch(g)
            _, loss, _, train_state = Lux.Training.single_train_step!(AutoZygote(), custom_loss,(g, g.ndata.x, y), train_state)
        end

        iter % infotime == 0 && report(iter)
    end
    return model, ps, st
end

model, ps, st = train_model!(model, ps, st);

# As one can see, our model reaches around **74% test accuracy**.
# Reasons for the fluctuations in accuracy can be explained by the rather small dataset (only 38 test graphs), and usually disappear once one applies GNNs to larger datasets.

# ## (Optional) Exercise

# Can we do better than this?
# As multiple papers pointed out ([Xu et al. (2018)](https://arxiv.org/abs/1810.00826), [Morris et al. (2018)](https://arxiv.org/abs/1810.02244)), applying **neighborhood normalization decreases the expressivity of GNNs in distinguishing certain graph structures**.
# An alternative formulation ([Morris et al. (2018)](https://arxiv.org/abs/1810.02244)) omits neighborhood normalization completely and adds a simple skip-connection to the GNN layer in order to preserve central node information:

# ```math
# \mathbf{x}_i^{(\ell+1)} = \mathbf{W}^{(\ell + 1)}_1 \mathbf{x}_i^{(\ell)} + \mathbf{W}^{(\ell + 1)}_2 \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j^{(\ell)}
# ```

# This layer is implemented under the name `GraphConv` in GraphNeuralNetworks.jl.

# As an exercise, you are invited to complete the following code to the extent that it makes use of `