Skip to content

Commit 03706c5

Browse files
Modernize Neural Networks tutorial patterns (#3881)
Fixes #3877 ## Description Modernizes the Neural Networks tutorial by updating the old-style `super()` call, replacing the commented `.data` parameter update with a `torch.no_grad()` update, and clarifying why `net.zero_grad()` is still used before the optimizer section introduces `optimizer.zero_grad()`. ## Testing - `git diff --check origin/main..HEAD` - `python -m py_compile beginner_source/blitz/neural_networks_tutorial.py` - `python beginner_source/blitz/neural_networks_tutorial.py` ## Checklist - [x] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER") - [x] Only one issue is addressed in this pull request - [x] Labels from the issue that this PR is fixing are added to this pull request - [x] No unnecessary issues are included into this pull request. Co-authored-by: sekyondaMeta <127536312+sekyondaMeta@users.noreply.github.com>
1 parent a06493e commit 03706c5

1 file changed

Lines changed: 7 additions & 4 deletions

File tree

beginner_source/blitz/neural_networks_tutorial.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,13 @@
4545
class Net(nn.Module):
4646

4747
def __init__(self):
48-
super(Net, self).__init__()
48+
super().__init__()
4949
# 1 input image channel, 6 output channels, 5x5 square convolution
5050
# kernel
5151
self.conv1 = nn.Conv2d(1, 6, 5)
5252
self.conv2 = nn.Conv2d(6, 16, 5)
5353
# an affine operation: y = Wx + b
54-
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension
54+
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension
5555
self.fc2 = nn.Linear(120, 84)
5656
self.fc3 = nn.Linear(84, 10)
5757

@@ -205,7 +205,9 @@ def forward(self, input):
205205
#
206206
#
207207
# Now we shall call ``loss.backward()``, and have a look at conv1's bias
208-
# gradients before and after the backward.
208+
# gradients before and after the backward. Since we have not introduced an
209+
# optimizer yet, we clear the gradients directly on the model. Once using an
210+
# optimizer, prefer ``optimizer.zero_grad()`` as shown below.
209211

210212

211213
net.zero_grad() # zeroes the gradient buffers of all parameters
@@ -246,7 +248,8 @@ def forward(self, input):
246248
#
247249
# learning_rate = 0.01
248250
# for f in net.parameters():
249-
# f.data.sub_(f.grad.data * learning_rate)
251+
# with torch.no_grad():
252+
# f -= f.grad * learning_rate
250253
#
251254
# However, as you use neural networks, you want to use various different
252255
# update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.

0 commit comments

Comments
 (0)