differences for PR #624

actions-user · actions-user · commit 2c5bfd186d2b · 2025-12-09T15:42:00.000Z
diff --git a/1-introduction.md b/1-introduction.md
@@ -133,7 +133,7 @@ In most neural networks, neurons are aggregated into layers. Signals travel from
 The image below shows an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in.
 
 ![
-Image credit: Glosser.ca, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons, 
+Image credit: Glosser.ca, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons,
 [original source](https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg)
 ](fig/01_neural_net.png){
 alt='A diagram of a three layer neural network with an input layer, one hidden layer, and an output layer.'
@@ -487,12 +487,16 @@ Keras also benefits from a very good set of [online documentation](https://keras
 Follow the [setup instructions](learners/setup.md#packages) to install Keras, Seaborn and scikit-learn.
 
 ## Testing Keras Installation
-Keras is available as a module within TensorFlow, as described in the [setup instructions](learners/setup.md#packages).
+Keras is available as a standalone package, as described in the [setup instructions](learners/setup.md#packages).
 Let's therefore check whether you have a suitable version of TensorFlow installed.
 Open up a new Jupyter notebook or interactive python console and run the following commands:
 ```python
-import tensorflow
-print(tensorflow.__version__)
+# Note: Before importing Keras, we have to instruct it to use PyTorch as the backend.
+import os
+os.environ['KERAS_BACKEND'] = 'torch'
+
+import keras
+print(keras.__version__)
 ```
 ```output
 2.17.0
diff --git a/2-keras.md b/2-keras.md
@@ -281,7 +281,7 @@ the same results (assuming you give the same integer) every time it is called.
 ```python
 from sklearn.model_selection import train_test_split
 
-X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=0, shuffle=True, stratify=target)
+x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=0, shuffle=True, stratify=target)
 ```
 
 ::: callout
@@ -302,19 +302,21 @@ This is a good time for switching instructor and/or a break.
 ### Keras for neural networks
 
 Keras is a machine learning framework with ease of use as one of its main features.
-It is part of the tensorflow python package and can be imported using `from tensorflow import keras`.
+It is a standalone python package that supports multiple deep learning frameworks as backends, and it can be imported using `import keras`.
+Here, we will use Keras with the PyTorch backend.
 
 Keras includes functions, classes and definitions to define deep learning models, cost functions and optimizers (optimizers are used to train a model).
 
 Before we move on to the next section of the workflow we need to make sure we have Keras imported.
 We do this as follows:
 ```python
-from tensorflow import keras
+import keras
 ```
 
 For this episode it is useful if everyone gets the same results from their training.
 Keras uses a random number generator at certain points during its execution.
-Therefore we will need to set two random seeds, one for numpy and one for tensorflow:
+Therefore, we will need to set two random seeds: one for NumPy and one for PyTorch.
+We can use a built-in Keras function to achieve this in one line of code:
 ```python
 keras.utils.set_random_seed(2)
 ```
@@ -348,7 +350,7 @@ and outputs a layer needs and therefore how many edges need to be created.
 This means we need to inform Keras how big our input is going to be. We do this by instantiating a `keras.Input` class and tell it how big our input is, thus the number of columns it contains.
 
 ```python
-inputs = keras.Input(shape=(X_train.shape[1],))
+inputs = keras.Input(shape=(x_train.shape[1],))
 ```
 
 We store a reference to this input class in a variable so we can pass it to the creation of
@@ -369,7 +371,7 @@ for inputs that are 0 and below and the identity function (returning the same va
 for inputs above 0.
 This is a commonly used activation function in deep neural networks that is proven to work well.
 
-Next we see an extra set of parenthenses with inputs in them. This means that after creating an
+Next we see an extra set of parenthenses with `inputs` in them. This means that after creating an
 instance of the Dense layer we call it as if it was a function.
 This tells the Dense layer to connect the layer passed as a parameter, in this case the inputs.
 
@@ -383,7 +385,7 @@ output_layer = keras.layers.Dense(3, activation="softmax")(hidden_layer)
 
 Because we chose the one-hot encoding, we use three neurons for the output layer.
 
-The `softmax` activation ensures that the three output neurons produce values in the range
+The [`softmax`](https://keras.io/api/layers/activations/#softmax-function) activation ensures that the three output neurons produce values in the range
 (0, 1) and they sum to 1.
 We can interpret this as a kind of 'probability' that the sample belongs to a certain
 species.
@@ -403,10 +405,10 @@ Keras distinguishes between two types of weights, namely:
 
 - trainable parameters: these are weights of the neurons that are modified when we train the model in order to minimize our loss function (we will learn about loss functions shortly!).
 
-- non-trainable parameters: these are weights of the neurons that are not changed when we train the model. These could be for many reasons - using a pre-trained model, choice of a particular filter for a convolutional neural network, and statistical weights for batch normalization are some examples.  
+- non-trainable parameters: these are weights of the neurons that are not changed when we train the model. These could be for many reasons - using a pre-trained model, choice of a particular filter for a convolutional neural network, and statistical weights for batch normalization are some examples.
 
 If these reasons are not clear right away, don't worry! In later episodes of this course, we will touch upon a couple of these concepts.
-::: 
+:::
 
 
 ::: instructor
@@ -483,9 +485,9 @@ Model: "functional"
  Non-trainable params: 0 (0.00 B)
 
 ```
-The model has 83 trainable parameters. Each of the 10 neurons in the in the `dense` hidden layer is connected to each of 
-the 4 inputs in the input layer resulting in 40 weights that can be trained. The 10 neurons in the hidden layer are also 
-connected to each of the 3 outputs in the `dense_1` output layer, resulting in a further 30 weights that can be trained. 
+The model has 83 trainable parameters. Each of the 10 neurons in the in the `dense` hidden layer is connected to each of
+the 4 inputs in the input layer resulting in 40 weights that can be trained. The 10 neurons in the hidden layer are also
+connected to each of the 3 outputs in the `dense_1` output layer, resulting in a further 30 weights that can be trained.
 By default `Dense` layers in Keras also contain 1 bias term for each neuron, resulting in a further 10 bias values for the
 hidden layer and 3 bias terms for the output layer. `40+30+10+3=83` trainable parameters.
 
@@ -524,7 +526,7 @@ So in total 8 extra parameters.
 ```python
 model = keras.Sequential(
     [
-        keras.Input(shape=(X_train.shape[1],)),
+        keras.Input(shape=(x_train.shape[1],)),
         keras.layers.Dense(10, activation="relu"),
         keras.layers.Dense(3, activation="softmax"),
     ]
@@ -571,13 +573,13 @@ This is a measure for how close the distribution of the three neural network out
 It is lower if the distributions are more similar.
 
 For more information on the available loss functions in Keras you can check the
-[documentation](https://www.tensorflow.org/api_docs/python/tf/keras/losses).
+[documentation](https://keras.io/api/losses/).
 
 Next we need to choose which optimizer to use and, if this optimizer has parameters, what values
 to use for those. Furthermore, we need to specify how many times to show the training samples to the optimizer.
 
 Once more, Keras gives us plenty of choices all of which have their own pros and cons,
-but for now let us go with the widely used [Adam optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam).
+but for now let us go with the widely used [Adam optimizer](https://keras.io/api/optimizers/adam/).
 Adam has a number of parameters, but the default values work well for most problems.
 So we will use it with its default parameters.
 
@@ -600,7 +602,7 @@ One training epoch means that every sample in the training data has been shown
 to the neural network and used to update its parameters.
 
 ```python
-history = model.fit(X_train, y_train, epochs=100)
+history = model.fit(x_train, y_train, epochs=100)
 ```
 
 The fit method returns a history object that has a history attribute with the training loss and
@@ -673,7 +675,7 @@ trained network.
 This will return a `numpy` matrix, which we convert
 to a pandas dataframe to easily see the labels.
 ```python
-y_pred = model.predict(X_test)
+y_pred = model.predict(x_test)
 prediction = pd.DataFrame(y_pred, columns=target.columns)
 prediction
 ```
@@ -822,7 +824,7 @@ many hyperparameter and model architecture choices.
 We will go into more depth of these choices in later episodes.
 For now it is important to realize that the parameters we chose were
 somewhat arbitrary and more careful consideration needs to be taken to
-pick hyperparameter values. 
+pick hyperparameter values.
 
 
 ## 10. Share model
@@ -844,7 +846,7 @@ This loaded model can be used as before to predict.
 
 ```python
 # use the pretrained model here
-y_pretrained_pred = pretrained_model.predict(X_test)
+y_pretrained_pred = pretrained_model.predict(x_test)
 pretrained_prediction = pd.DataFrame(y_pretrained_pred, columns=target.columns.values)
 
 # idxmax will select the column for each row with the highest value
diff --git a/fig/.gitkeep b/fig/.gitkeep
diff --git a/md5sum.txt b/md5sum.txt
@@ -5,8 +5,8 @@
 "index.md" "8b5609014b8028029f48266bc751663e" "site/built/index.md" "2025-05-06"
 "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2025-02-11"
 "workshops.md" "db285c697b5f062098913c08119130ff" "site/built/workshops.md" "2025-10-23"
-"episodes/1-introduction.md" "3602dbf5bab0ed4e21679de5f27d3f47" "site/built/1-introduction.md" "2025-12-09"
-"episodes/2-keras.md" "ab5ebd62fd3e6cc2ad69bd805d26f7f1" "site/built/2-keras.md" "2025-09-03"
+"episodes/1-introduction.md" "41c91d12c3e45b5bc5f339a0a4aae666" "site/built/1-introduction.md" "2025-12-09"
+"episodes/2-keras.md" "5583f21576ffb21ba9b37c7c5aab7add" "site/built/2-keras.md" "2025-12-09"
 "episodes/3-monitor-the-model.md" "8eda70a03e5225033ba05563b517ec88" "site/built/3-monitor-the-model.md" "2025-09-02"
 "episodes/4-advanced-layer-types.md" "eaca1f96ead3140467a4ba7f069284fa" "site/built/4-advanced-layer-types.md" "2025-09-03"
 "episodes/5-transfer-learning.md" "65ed8dff158123d8271100fd8d18fd3b" "site/built/5-transfer-learning.md" "2025-12-09"
@@ -17,6 +17,6 @@
 "instructors/schedule.md" "332b32d24f144b29a280176e6b5d015f" "site/built/schedule.md" "2025-03-10"
 "instructors/survey-templates.md" "ea5d46e7b54d335f79e57a7bc31d1c5c" "site/built/survey-templates.md" "2025-02-11"
 "learners/reference.md" "6a11d5269dc9d1d31d4016086580e838" "site/built/reference.md" "2025-12-09"
-"learners/setup.md" "2e741a6d76091da5c832ab010380f0e1" "site/built/setup.md" "2025-09-01"
+"learners/setup.md" "dbf2c7a18b7bc9ab5d404024b2e48930" "site/built/setup.md" "2025-12-09"
 "paper/paper.md" "2b05562e0f9d393818ad4e7ea2d5c152" "site/built/paper.md" "2025-10-20"
 "profiles/learner-profiles.md" "ef0f26dd0874387d80ed3fd468b99e23" "site/built/learner-profiles.md" "2025-02-11"
diff --git a/setup.md b/setup.md
@@ -28,7 +28,7 @@ Open a terminal (Mac/Linux) or Command Prompt (Windows) and run the following co
 
 ::: spoiler
 
-### On Linux/macOs
+### On Linux/MacOS
 
 ```shell
 python3 -m venv dl_workshop
@@ -50,7 +50,7 @@ py -m venv dl_workshop
 
 ::: spoiler
 
-### On Linux/macOs
+### On Linux/MacOS
 
 ```shell
 source dl_workshop/bin/activate
@@ -74,32 +74,24 @@ Remember that you need to activate your environment every time you restart your
 
 ::: spoiler
 
-### On Linux/macOs
+### On Linux/MacOS
 
 ```shell
-python3 -m pip install jupyter seaborn scikit-learn pandas tensorflow pydot
+python3 -m pip install jupyter seaborn scikit-learn pandas keras torch pydot
 ```
 
-Note for MacOS users: there is a package `tensorflow-metal` which accelerates the training of machine learning models with TensorFlow on a recent Mac with a Silicon chip (M1/M2/M3).
-However, the installation is currently broken in the most recent version (as of January 2025), see the [developer forum](https://developer.apple.com/forums/thread/772147).
-
 :::
 
 ::: spoiler
 
 ### On Windows
 
 ```shell
-py -m pip install jupyter seaborn scikit-learn pandas tensorflow pydot
+py -m pip install jupyter seaborn scikit-learn pandas keras torch pydot
 ```
 
 :::
-
-Note: Tensorflow makes Keras available as a module too.
-
-An [optional challenge in episode 2](episodes/2-keras.md) requires installation of Graphviz
-and instructions for doing that can be found
-[by following this link](https://graphviz.org/download/).
+An [optional challenge in episode 2](episodes/2-keras.md) requires installation of Graphviz. Instructions for doing that can be found [by following this link](https://graphviz.org/download/).
 
 ## Starting Jupyter Lab
 
@@ -108,7 +100,7 @@ Jupyter Lab is compatible with Firefox, Chrome, Safari and Chromium-based browse
 Note that Internet Explorer and Edge are *not* supported.
 See the [Jupyter Lab documentation](https://jupyterlab.readthedocs.io/en/latest/getting_started/accessibility.html#compatibility-with-browsers-and-assistive-technology) for an up-to-date list of supported browsers.
 
-To start Jupyter Lab, open a terminal (Mac/Linux) or Command Prompt (Windows), 
+To start Jupyter Lab, open a terminal (Mac/Linux) or Command Prompt (Windows),
 make sure that you activated the virtual environment you created for this course,
 and type the command:
 
@@ -121,31 +113,38 @@ To check whether all packages installed correctly, start a jupyter notebook in j
 explained above. Run the following lines of code:
 ```python
 import sklearn
-print('sklearn version: ', sklearn.__version__)
+print(f'Sklearn version: {sklearn.__version__}')
 
 import seaborn
-print('seaborn version: ', seaborn.__version__)
+print(f'Seaborn version: {seaborn.__version__}')
 
 import pandas
-print('pandas version: ', pandas.__version__)
+print(f'Pandas version: {pandas.__version__}')
+
+import torch
+print(f'PyTorch version: {torch.__version__}')
+
+# Note: Before importing Keras, we have to instruct it to use PyTorch as the backend.
+import os
+os.environ['KERAS_BACKEND'] = 'torch'
 
-import tensorflow
-print('Tensorflow version: ', tensorflow.__version__)
+import keras
+print(f'Keras version: {keras.__version__}')
 ```
 
 This should output the versions of all required packages without giving errors.
 Most versions will work fine with this lesson, but:
-- For Keras and Tensorflow, the minimum version is 2.12.0
+- For Keras, the minimum version is 2.12.0
 - For sklearn, the minimum version is 1.2.2
 
 ## Fallback option: cloud environment
 If a local installation does not work for you, it is also possible to run this lesson in [Binder Hub](https://mybinder.org/v2/gh/carpentries-lab/deep-learning-intro/scaffolds). This should give you an environment with all the required software and data to run this lesson, nothing which is saved will be stored, please copy any files you want to keep. Note that if you are the first person to launch this in the last few days it can take several minutes to startup. The second person who loads it should find it loads in under a minute. Instructors who intend to use this option should start it themselves shortly before the workshop begins.
 
-Alternatively you can use [Google colab](https://colab.research.google.com/). If you open a jupyter notebook here, the required packages are already pre-installed. Note that google colab uses jupyter notebook instead of Jupyter Lab.
+Alternatively you can use [Google Colab](https://colab.research.google.com/). If you open a jupyter notebook here, the required packages are already pre-installed. Note that Google Colab uses jupyter notebook instead of Jupyter Lab.
 
 ## Downloading the required datasets
 
-Download the [weather dataset prediction csv][weatherdata] and [Dollar street dataset (4 files in total)][dollar-street]
+Download the [Weather dataset prediction csv][weatherdata] and [Dollar street dataset (4 files in total)][dollar-street]
 
 [dollar-street]: https://zenodo.org/api/records/10970014/files-archive
 [jupyter]: http://jupyter.org/