Skip to content

Commit 54d49fa

Browse files
refine lab explore data; add exercises
1 parent 641fc45 commit 54d49fa

File tree

2 files changed

+2742
-1434
lines changed

2 files changed

+2742
-1434
lines changed

book/lab_explore_data.ipynb

Lines changed: 63 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
"id": "86550260-7939-4970-a420-5eb24e8158c2",
1818
"metadata": {},
1919
"source": [
20-
"We want to take a look at this real-world dataset: [https://github.com/OpenNeuroDatasets/ds005420](https://github.com/OpenNeuroDatasets/ds005420)"
20+
"We will work here with this real-world dataset of resting state EEG signals [https://github.com/OpenNeuroDatasets/ds005420](https://github.com/OpenNeuroDatasets/ds005420). "
2121
]
2222
},
2323
{
@@ -80,9 +80,7 @@
8080
"\n",
8181
"1) Write a unit test (inside `/pycourse/tests/test_data.py`) to make sure the number of subject sub-directories corresponds to actual the number of subjects. **Hint:** Look at the metadata.\n",
8282
"2) Verify that all subject directories have a eeg sub-directory. \n",
83-
"3) Verify that all data in a subject directories matches with the subject number. \n",
84-
"4) Assert that EEG data for all subjects was taken using 20 channels and sampling frequency 500. \n",
85-
"5) (Optional) Write a file (`discarded_subjects.txt`) with the subject numbers that do not match that criterion. "
83+
"3) Assert that EEG data for all subjects was taken using 20 channels and sampling frequency 500. "
8684
]
8785
},
8886
{
@@ -92,9 +90,7 @@
9290
"source": [
9391
"## Exploratory data analysis \n",
9492
"Now we want to look at the data.\n",
95-
"We find that the data is in a particular format `.edf` that we cannot directly read in python. \n",
96-
"**Hint:**\n",
97-
"We need to install a third-party library `mne` to read `.edf` files. \n",
93+
"We find that the data is in a format called [European Data Format](https://en.wikipedia.org/wiki/European_Data_Format) (`.edf`) and we need to install a third-party library, `mne`, to read it.\n",
9894
"You can check out the [library documentation here](https://mne.tools/dev/)"
9995
]
10096
},
@@ -114,34 +110,83 @@
114110
"id": "bb817e5c-3096-4d05-9120-ba7b05844403",
115111
"metadata": {},
116112
"source": [
117-
"1) Plot one time series. \n",
118-
"2) Plot all time series with labels according to channel name. \n",
119-
"3) Plot the channels that start with \"T\" and \"O\". \n",
120-
"4) Plot a correlation plot of the \"T\" and \"O\" channels as a heatmap.\n",
121-
"5) Plot a histogram of `RecordingDuration` across all subjects. "
113+
"**Hints:**\n",
114+
"\n",
115+
"- Look at function `mne.io.read_raw_edf` to load data.\n",
116+
"- Look at the method `.to_data_frame` of the loaded data.\n",
117+
"\n",
118+
"1) Plot one time series.\n",
119+
"2) Clean the column names removing \"EEG\", eg \"EEG C4-A1A2\" -> \"C4-A1A2\"\n",
120+
"3) Plot all time series with labels according to channel name. **Hint:** Look at `melt` method of dataframes\n",
121+
"4) Plot the channels that start with \"P\", \"T\" or \"O\". \n",
122+
"5) Plot a correlation plot of all-vs-all the \"P\", \"T\" and \"O\" channels as a heatmap. **Hint:** Look up seaborn's documentation on heatmaps.\n",
123+
"6) Save the correlation plot in svg format."
122124
]
123125
},
124126
{
125127
"cell_type": "markdown",
126128
"id": "196f319e-51a8-48bd-91f0-931978a5ec04",
127129
"metadata": {},
128130
"source": [
129-
"## Process data\n",
131+
"## Single-subject data\n",
130132
"After having taken this quick look at the data, we want to start processing the data.\n",
133+
"So far we are working with data coming from one subject.\n",
134+
"\n",
135+
"1) Substract the mean from each channel\n",
136+
"2) Plot the time series with substracted mean for all channels\n",
137+
"3) Standarize and plot all time series again.\n",
138+
"Standarization means: \n",
139+
"$$\n",
140+
"y = (x - mean) / standardDeviation\n",
141+
"$$"
142+
]
143+
},
144+
{
145+
"cell_type": "markdown",
146+
"id": "d788061b-93ce-4105-8cf7-9de25efa4f2f",
147+
"metadata": {},
148+
"source": [
149+
"## Multi-subject data\n",
150+
"Here we are going to work with data of more than one subject at the time.\n",
131151
"\n",
132-
"1) Clean the column names removing \"EEG\", eg \"EEG C4-A1A2\" -> \"C4-A1A2\"\n",
133-
"2) Substract the mean from each channel \n",
134-
"3) Plot correlation matrix of all-vs-all channels. **Hint:** Look at seaborn documentation on heatmaps.\n",
135-
"4) Save the correlation plot as vector graphics."
152+
"1) Plot a histogram of `RecordingDuration` across all subjects. **Hint:** assume we want data in \"oc_eeg\"\n",
153+
"2) Pick 3 EEG channels and plot the time series (aggregated across all subjects) in one plot. Differentiate the lines by channel. **Hint:** Use seaborn and look up the `hue` parameter.\n",
154+
"3) Plot a grid of subplots with each plot representing 1 channel (aggregated across subjects). **Hint:** Adapt [this example](https://seaborn.pydata.org/examples/timeseries_facets.html) \n",
155+
"4) Pick 5 channels and only 3 time points. Simulate subjects belong to 3 groups.\n",
156+
"5) Adapt [this example](https://seaborn.pydata.org/examples/pointplot_anova.html) to plot a comparison between channels/subjects/time."
136157
]
137158
},
138159
{
139160
"cell_type": "code",
140161
"execution_count": null,
141-
"id": "9b7edbe2-0c6f-4dd1-bf92-74f080b2e00f",
162+
"id": "14452053-710d-4c46-9d9a-ee075692ec7b",
163+
"metadata": {},
164+
"outputs": [],
165+
"source": []
166+
},
167+
{
168+
"cell_type": "code",
169+
"execution_count": null,
170+
"id": "ce196df5-f404-448a-88cd-1b164a2aafdf",
142171
"metadata": {},
143172
"outputs": [],
144173
"source": []
174+
},
175+
{
176+
"cell_type": "markdown",
177+
"id": "503eef31-86a5-41c9-8cd0-246ab18b4a2d",
178+
"metadata": {},
179+
"source": [
180+
"## Consolidate pipeline\n",
181+
"\n",
182+
"Let's consolidate our workflow into a pipeline.\n",
183+
"\n",
184+
"- read and assert subfolders\n",
185+
"- clean column names\n",
186+
"- standarize values\n",
187+
"- plot correlations\n",
188+
"- run tests"
189+
]
145190
}
146191
],
147192
"metadata": {

0 commit comments

Comments
 (0)