|
17 | 17 | "id": "86550260-7939-4970-a420-5eb24e8158c2", |
18 | 18 | "metadata": {}, |
19 | 19 | "source": [ |
20 | | - "We want to take a look at this real-world dataset: [https://github.com/OpenNeuroDatasets/ds005420](https://github.com/OpenNeuroDatasets/ds005420)" |
| 20 | + "We will work here with this real-world dataset of resting state EEG signals [https://github.com/OpenNeuroDatasets/ds005420](https://github.com/OpenNeuroDatasets/ds005420). " |
21 | 21 | ] |
22 | 22 | }, |
23 | 23 | { |
|
80 | 80 | "\n", |
81 | 81 | "1) Write a unit test (inside `/pycourse/tests/test_data.py`) to make sure the number of subject sub-directories corresponds to actual the number of subjects. **Hint:** Look at the metadata.\n", |
82 | 82 | "2) Verify that all subject directories have a eeg sub-directory. \n", |
83 | | - "3) Verify that all data in a subject directories matches with the subject number. \n", |
84 | | - "4) Assert that EEG data for all subjects was taken using 20 channels and sampling frequency 500. \n", |
85 | | - "5) (Optional) Write a file (`discarded_subjects.txt`) with the subject numbers that do not match that criterion. " |
| 83 | + "3) Assert that EEG data for all subjects was taken using 20 channels and sampling frequency 500. " |
86 | 84 | ] |
87 | 85 | }, |
88 | 86 | { |
|
92 | 90 | "source": [ |
93 | 91 | "## Exploratory data analysis \n", |
94 | 92 | "Now we want to look at the data.\n", |
95 | | - "We find that the data is in a particular format `.edf` that we cannot directly read in python. \n", |
96 | | - "**Hint:**\n", |
97 | | - "We need to install a third-party library `mne` to read `.edf` files. \n", |
| 93 | + "We find that the data is in a format called [European Data Format](https://en.wikipedia.org/wiki/European_Data_Format) (`.edf`) and we need to install a third-party library, `mne`, to read it.\n", |
98 | 94 | "You can check out the [library documentation here](https://mne.tools/dev/)" |
99 | 95 | ] |
100 | 96 | }, |
|
114 | 110 | "id": "bb817e5c-3096-4d05-9120-ba7b05844403", |
115 | 111 | "metadata": {}, |
116 | 112 | "source": [ |
117 | | - "1) Plot one time series. \n", |
118 | | - "2) Plot all time series with labels according to channel name. \n", |
119 | | - "3) Plot the channels that start with \"T\" and \"O\". \n", |
120 | | - "4) Plot a correlation plot of the \"T\" and \"O\" channels as a heatmap.\n", |
121 | | - "5) Plot a histogram of `RecordingDuration` across all subjects. " |
| 113 | + "**Hints:**\n", |
| 114 | + "\n", |
| 115 | + "- Look at function `mne.io.read_raw_edf` to load data.\n", |
| 116 | + "- Look at the method `.to_data_frame` of the loaded data.\n", |
| 117 | + "\n", |
| 118 | + "1) Plot one time series.\n", |
| 119 | + "2) Clean the column names removing \"EEG\", eg \"EEG C4-A1A2\" -> \"C4-A1A2\"\n", |
| 120 | + "3) Plot all time series with labels according to channel name. **Hint:** Look at `melt` method of dataframes\n", |
| 121 | + "4) Plot the channels that start with \"P\", \"T\" or \"O\". \n", |
| 122 | + "5) Plot a correlation plot of all-vs-all the \"P\", \"T\" and \"O\" channels as a heatmap. **Hint:** Look up seaborn's documentation on heatmaps.\n", |
| 123 | + "6) Save the correlation plot in svg format." |
122 | 124 | ] |
123 | 125 | }, |
124 | 126 | { |
125 | 127 | "cell_type": "markdown", |
126 | 128 | "id": "196f319e-51a8-48bd-91f0-931978a5ec04", |
127 | 129 | "metadata": {}, |
128 | 130 | "source": [ |
129 | | - "## Process data\n", |
| 131 | + "## Single-subject data\n", |
130 | 132 | "After having taken this quick look at the data, we want to start processing the data.\n", |
| 133 | + "So far we are working with data coming from one subject.\n", |
| 134 | + "\n", |
| 135 | + "1) Substract the mean from each channel\n", |
| 136 | + "2) Plot the time series with substracted mean for all channels\n", |
| 137 | + "3) Standarize and plot all time series again.\n", |
| 138 | + "Standarization means: \n", |
| 139 | + "$$\n", |
| 140 | + "y = (x - mean) / standardDeviation\n", |
| 141 | + "$$" |
| 142 | + ] |
| 143 | + }, |
| 144 | + { |
| 145 | + "cell_type": "markdown", |
| 146 | + "id": "d788061b-93ce-4105-8cf7-9de25efa4f2f", |
| 147 | + "metadata": {}, |
| 148 | + "source": [ |
| 149 | + "## Multi-subject data\n", |
| 150 | + "Here we are going to work with data of more than one subject at the time.\n", |
131 | 151 | "\n", |
132 | | - "1) Clean the column names removing \"EEG\", eg \"EEG C4-A1A2\" -> \"C4-A1A2\"\n", |
133 | | - "2) Substract the mean from each channel \n", |
134 | | - "3) Plot correlation matrix of all-vs-all channels. **Hint:** Look at seaborn documentation on heatmaps.\n", |
135 | | - "4) Save the correlation plot as vector graphics." |
| 152 | + "1) Plot a histogram of `RecordingDuration` across all subjects. **Hint:** assume we want data in \"oc_eeg\"\n", |
| 153 | + "2) Pick 3 EEG channels and plot the time series (aggregated across all subjects) in one plot. Differentiate the lines by channel. **Hint:** Use seaborn and look up the `hue` parameter.\n", |
| 154 | + "3) Plot a grid of subplots with each plot representing 1 channel (aggregated across subjects). **Hint:** Adapt [this example](https://seaborn.pydata.org/examples/timeseries_facets.html) \n", |
| 155 | + "4) Pick 5 channels and only 3 time points. Simulate subjects belong to 3 groups.\n", |
| 156 | + "5) Adapt [this example](https://seaborn.pydata.org/examples/pointplot_anova.html) to plot a comparison between channels/subjects/time." |
136 | 157 | ] |
137 | 158 | }, |
138 | 159 | { |
139 | 160 | "cell_type": "code", |
140 | 161 | "execution_count": null, |
141 | | - "id": "9b7edbe2-0c6f-4dd1-bf92-74f080b2e00f", |
| 162 | + "id": "14452053-710d-4c46-9d9a-ee075692ec7b", |
| 163 | + "metadata": {}, |
| 164 | + "outputs": [], |
| 165 | + "source": [] |
| 166 | + }, |
| 167 | + { |
| 168 | + "cell_type": "code", |
| 169 | + "execution_count": null, |
| 170 | + "id": "ce196df5-f404-448a-88cd-1b164a2aafdf", |
142 | 171 | "metadata": {}, |
143 | 172 | "outputs": [], |
144 | 173 | "source": [] |
| 174 | + }, |
| 175 | + { |
| 176 | + "cell_type": "markdown", |
| 177 | + "id": "503eef31-86a5-41c9-8cd0-246ab18b4a2d", |
| 178 | + "metadata": {}, |
| 179 | + "source": [ |
| 180 | + "## Consolidate pipeline\n", |
| 181 | + "\n", |
| 182 | + "Let's consolidate our workflow into a pipeline.\n", |
| 183 | + "\n", |
| 184 | + "- read and assert subfolders\n", |
| 185 | + "- clean column names\n", |
| 186 | + "- standarize values\n", |
| 187 | + "- plot correlations\n", |
| 188 | + "- run tests" |
| 189 | + ] |
145 | 190 | } |
146 | 191 | ], |
147 | 192 | "metadata": { |
|
0 commit comments