Skip to content

Commit 641fc45

Browse files
revise lab_explore_data and solutions
1 parent a614370 commit 641fc45

File tree

4 files changed

+1919
-898
lines changed

4 files changed

+1919
-898
lines changed

book/_quarto.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ book:
5050
- 081_pandas.ipynb
5151
- 09_plotting.ipynb
5252
- 10_testing.ipynb
53-
- lab_explore_data_exercises.ipynb
53+
- lab_explore_data.ipynb
5454
- 11_cli.ipynb
5555
- 13_tools.ipynb
5656
- 051_oop.ipynb

book/lab_explore_data.ipynb

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "0076254c-d77a-4333-9c71-807d9c680054",
6+
"metadata": {},
7+
"source": [
8+
"---\n",
9+
"title: \"Lab: Exploring a Dataset\"\n",
10+
"toc: true\n",
11+
"output-file: lab_explore_dataset.html\n",
12+
"---"
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "86550260-7939-4970-a420-5eb24e8158c2",
18+
"metadata": {},
19+
"source": [
20+
"We want to take a look at this real-world dataset: [https://github.com/OpenNeuroDatasets/ds005420](https://github.com/OpenNeuroDatasets/ds005420)"
21+
]
22+
},
23+
{
24+
"cell_type": "markdown",
25+
"id": "931fe244-be1a-4935-9e18-081f3ac4f08c",
26+
"metadata": {},
27+
"source": [
28+
"## Download data\n",
29+
"\n",
30+
"### Clone the repository \n",
31+
"Inside the `/pycourse/data` folder:\n",
32+
"```bash\n",
33+
"git clone https://github.com/OpenNeuroDatasets/ds005420\n",
34+
"```\n",
35+
"\n",
36+
"### Install `git-annex`\n",
37+
"The files with the actual data are not there, but we have the references to them so that we can pull them down.\n",
38+
"We will need the tool [git-annex tool](https://git-annex.branchable.com/install/).\n",
39+
"\n",
40+
"\n",
41+
"### Pull the data\n",
42+
"Open a terminal inside `ds005420` and run:\n",
43+
"```bash\n",
44+
"git-annex get .\n",
45+
"```\n",
46+
"You should see a progress dialog showing `...from s3-PUBLIC...` \n",
47+
"After that, you're ready to go with the exercises."
48+
]
49+
},
50+
{
51+
"cell_type": "markdown",
52+
"id": "2f3d879b-f266-4b1f-96eb-cec0e3c4d2b2",
53+
"metadata": {},
54+
"source": [
55+
":::{ .callout-tip }\n",
56+
"Jupyter Notebooks are ideal for this kind of exploratory tasks.\n",
57+
"Make a directory called `/python/notebooks` and open there a jupyter lab instance.\n",
58+
"Having the notebooks there will help us keeping things tidy for later reproducibility of our workflow.\n",
59+
":::"
60+
]
61+
},
62+
{
63+
"cell_type": "markdown",
64+
"id": "8570f8a3-9719-4978-a648-7b5971b8062d",
65+
"metadata": {},
66+
"source": [
67+
"## Explore files\n",
68+
"1) List only the sub-directories in path. \n",
69+
"2) List only the sub-directories with subject data.\n",
70+
"3) Write a function that lists sub-directories with subject data."
71+
]
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"id": "3ebcd69a-5222-4514-9531-7cdf54f782dd",
76+
"metadata": {},
77+
"source": [
78+
"## Validating the data\n",
79+
"We will start by making sure our data/metadata contains the information we expect at a high level.\n",
80+
"\n",
81+
"1) Write a unit test (inside `/pycourse/tests/test_data.py`) to make sure the number of subject sub-directories corresponds to actual the number of subjects. **Hint:** Look at the metadata.\n",
82+
"2) Verify that all subject directories have a eeg sub-directory. \n",
83+
"3) Verify that all data in a subject directories matches with the subject number. \n",
84+
"4) Assert that EEG data for all subjects was taken using 20 channels and sampling frequency 500. \n",
85+
"5) (Optional) Write a file (`discarded_subjects.txt`) with the subject numbers that do not match that criterion. "
86+
]
87+
},
88+
{
89+
"cell_type": "markdown",
90+
"id": "3af93b8f-e20d-42f8-8c17-0830ba14c8b6",
91+
"metadata": {},
92+
"source": [
93+
"## Exploratory data analysis \n",
94+
"Now we want to look at the data.\n",
95+
"We find that the data is in a particular format `.edf` that we cannot directly read in python. \n",
96+
"**Hint:**\n",
97+
"We need to install a third-party library `mne` to read `.edf` files. \n",
98+
"You can check out the [library documentation here](https://mne.tools/dev/)"
99+
]
100+
},
101+
{
102+
"cell_type": "markdown",
103+
"id": "de6b3872-f101-4fe1-ac5c-079d7a13887d",
104+
"metadata": {},
105+
"source": [
106+
":::{ .callout-tip }\n",
107+
"It's a *very* good idea to first take a look at the documentation of a tool before installing it.\n",
108+
"Executing someone else's code is a potential risk so you should try to find out if you can actually trust the source.\n",
109+
":::"
110+
]
111+
},
112+
{
113+
"cell_type": "markdown",
114+
"id": "bb817e5c-3096-4d05-9120-ba7b05844403",
115+
"metadata": {},
116+
"source": [
117+
"1) Plot one time series. \n",
118+
"2) Plot all time series with labels according to channel name. \n",
119+
"3) Plot the channels that start with \"T\" and \"O\". \n",
120+
"4) Plot a correlation plot of the \"T\" and \"O\" channels as a heatmap.\n",
121+
"5) Plot a histogram of `RecordingDuration` across all subjects. "
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"id": "196f319e-51a8-48bd-91f0-931978a5ec04",
127+
"metadata": {},
128+
"source": [
129+
"## Process data\n",
130+
"After having taken this quick look at the data, we want to start processing the data.\n",
131+
"\n",
132+
"1) Clean the column names removing \"EEG\", eg \"EEG C4-A1A2\" -> \"C4-A1A2\"\n",
133+
"2) Substract the mean from each channel \n",
134+
"3) Plot correlation matrix of all-vs-all channels. **Hint:** Look at seaborn documentation on heatmaps.\n",
135+
"4) Save the correlation plot as vector graphics."
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": null,
141+
"id": "9b7edbe2-0c6f-4dd1-bf92-74f080b2e00f",
142+
"metadata": {},
143+
"outputs": [],
144+
"source": []
145+
}
146+
],
147+
"metadata": {
148+
"kernelspec": {
149+
"display_name": "Python 3 (ipykernel)",
150+
"language": "python",
151+
"name": "python3"
152+
},
153+
"language_info": {
154+
"codemirror_mode": {
155+
"name": "ipython",
156+
"version": 3
157+
},
158+
"file_extension": ".py",
159+
"mimetype": "text/x-python",
160+
"name": "python",
161+
"nbconvert_exporter": "python",
162+
"pygments_lexer": "ipython3",
163+
"version": "3.13.3"
164+
}
165+
},
166+
"nbformat": 4,
167+
"nbformat_minor": 5
168+
}

book/lab_explore_data_exercises.ipynb

Lines changed: 0 additions & 149 deletions
This file was deleted.

0 commit comments

Comments
 (0)