Skip to content

Commit ff4da5e

Browse files
add tsv and read/write exercises
1 parent a87f128 commit ff4da5e

File tree

6 files changed

+268
-12
lines changed

6 files changed

+268
-12
lines changed

book/062_read_and_write.ipynb

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,19 @@
432432
"Material under construction\n",
433433
"🚧"
434434
]
435+
},
436+
{
437+
"cell_type": "markdown",
438+
"id": "4923c854-bb58-4ab0-8ec5-f9a7815bb7ab",
439+
"metadata": {},
440+
"source": [
441+
"## Exercises\n",
442+
"\n",
443+
"Now that we now how to read files from within python, we can tidy up things a bit from previous chapter.\n",
444+
"\n",
445+
"1) Download the data from the repository [here](https://github.com/OpenNeuroDatasets/ds005588/blob/main/sub-336/func/sub-336_task-respo_run-2_events.tsv) into a local file on your computer.\n",
446+
"2) Write a function that takes a path and returns the list of dictionaries from exercise 8.5. Read the file content using `pathlib`."
447+
]
435448
}
436449
],
437450
"metadata": {
@@ -450,7 +463,7 @@
450463
"name": "python",
451464
"nbconvert_exporter": "python",
452465
"pygments_lexer": "ipython3",
453-
"version": "3.10.13"
466+
"version": "3.13.3"
454467
}
455468
},
456469
"nbformat": 4,

book/063_common_data_formats.ipynb

Lines changed: 146 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@
9393
},
9494
{
9595
"cell_type": "code",
96-
"execution_count": 3,
96+
"execution_count": 4,
9797
"id": "0dad7357-4e14-4d1e-9856-0f6b487aff7b",
9898
"metadata": {},
9999
"outputs": [
@@ -160,7 +160,7 @@
160160
"3 2020-01-03 x19 28"
161161
]
162162
},
163-
"execution_count": 3,
163+
"execution_count": 4,
164164
"metadata": {},
165165
"output_type": "execute_result"
166166
}
@@ -182,7 +182,7 @@
182182
},
183183
{
184184
"cell_type": "code",
185-
"execution_count": 4,
185+
"execution_count": 5,
186186
"id": "16cd3840-8bcf-411d-97d4-073bd7451543",
187187
"metadata": {},
188188
"outputs": [],
@@ -192,7 +192,7 @@
192192
},
193193
{
194194
"cell_type": "code",
195-
"execution_count": 36,
195+
"execution_count": 6,
196196
"id": "4831254b-2b1a-4b52-a105-214005c79048",
197197
"metadata": {},
198198
"outputs": [
@@ -212,6 +212,137 @@
212212
"!cat /tmp/data.csv"
213213
]
214214
},
215+
{
216+
"cell_type": "code",
217+
"execution_count": 7,
218+
"id": "be179048-6908-470d-aab2-e2b3690aea81",
219+
"metadata": {},
220+
"outputs": [],
221+
"source": [
222+
"#| code-fold: true\n",
223+
"#| output: false\n",
224+
"\n",
225+
"df.to_csv(\"/tmp/data.tsv\", sep=\"\\t\", index=None)"
226+
]
227+
},
228+
{
229+
"cell_type": "markdown",
230+
"id": "7d498565-0f46-40dc-8ad8-56cfcc9f2db6",
231+
"metadata": {},
232+
"source": [
233+
"## TSV\n",
234+
"Similar to CSV, [Tab Separated Values format](https://en.wikipedia.org/wiki/Tab-separated_values) uses Tabs instead of comma to separate values of each line."
235+
]
236+
},
237+
{
238+
"cell_type": "code",
239+
"execution_count": 9,
240+
"id": "b530c8be-648d-46dd-913e-1d0bc0534e38",
241+
"metadata": {},
242+
"outputs": [
243+
{
244+
"name": "stdout",
245+
"output_type": "stream",
246+
"text": [
247+
"date\tid\tage\n",
248+
"2020-01-01\tx12\t19\n",
249+
"2020-01-02\tx11\t23\n",
250+
"2020-01-02\tx3\t22\n",
251+
"2020-01-03\tx19\t28\n"
252+
]
253+
}
254+
],
255+
"source": [
256+
"!cat /tmp/data.tsv"
257+
]
258+
},
259+
{
260+
"cell_type": "markdown",
261+
"id": "98550025-bf16-4876-81dd-6acd2f058ec5",
262+
"metadata": {},
263+
"source": [
264+
"Luckily `pandas` can handle that too."
265+
]
266+
},
267+
{
268+
"cell_type": "code",
269+
"execution_count": 11,
270+
"id": "a118b390-3e9c-4bd5-a925-eb68bd089f2a",
271+
"metadata": {},
272+
"outputs": [
273+
{
274+
"data": {
275+
"text/html": [
276+
"<div>\n",
277+
"<style scoped>\n",
278+
" .dataframe tbody tr th:only-of-type {\n",
279+
" vertical-align: middle;\n",
280+
" }\n",
281+
"\n",
282+
" .dataframe tbody tr th {\n",
283+
" vertical-align: top;\n",
284+
" }\n",
285+
"\n",
286+
" .dataframe thead th {\n",
287+
" text-align: right;\n",
288+
" }\n",
289+
"</style>\n",
290+
"<table border=\"1\" class=\"dataframe\">\n",
291+
" <thead>\n",
292+
" <tr style=\"text-align: right;\">\n",
293+
" <th></th>\n",
294+
" <th>date</th>\n",
295+
" <th>id</th>\n",
296+
" <th>age</th>\n",
297+
" </tr>\n",
298+
" </thead>\n",
299+
" <tbody>\n",
300+
" <tr>\n",
301+
" <th>0</th>\n",
302+
" <td>2020-01-01</td>\n",
303+
" <td>x12</td>\n",
304+
" <td>19</td>\n",
305+
" </tr>\n",
306+
" <tr>\n",
307+
" <th>1</th>\n",
308+
" <td>2020-01-02</td>\n",
309+
" <td>x11</td>\n",
310+
" <td>23</td>\n",
311+
" </tr>\n",
312+
" <tr>\n",
313+
" <th>2</th>\n",
314+
" <td>2020-01-02</td>\n",
315+
" <td>x3</td>\n",
316+
" <td>22</td>\n",
317+
" </tr>\n",
318+
" <tr>\n",
319+
" <th>3</th>\n",
320+
" <td>2020-01-03</td>\n",
321+
" <td>x19</td>\n",
322+
" <td>28</td>\n",
323+
" </tr>\n",
324+
" </tbody>\n",
325+
"</table>\n",
326+
"</div>"
327+
],
328+
"text/plain": [
329+
" date id age\n",
330+
"0 2020-01-01 x12 19\n",
331+
"1 2020-01-02 x11 23\n",
332+
"2 2020-01-02 x3 22\n",
333+
"3 2020-01-03 x19 28"
334+
]
335+
},
336+
"execution_count": 11,
337+
"metadata": {},
338+
"output_type": "execute_result"
339+
}
340+
],
341+
"source": [
342+
"df = pd.read_csv(\"/tmp/data.tsv\", sep=\"\\t\")\n",
343+
"df"
344+
]
345+
},
215346
{
216347
"cell_type": "markdown",
217348
"id": "9ffc3c66-3e20-4d95-93bd-f20312f649a0",
@@ -851,12 +982,17 @@
851982
]
852983
},
853984
{
854-
"cell_type": "code",
855-
"execution_count": null,
856-
"id": "17fd5529-094d-463a-98cf-2ca6e7a47746",
985+
"cell_type": "markdown",
986+
"id": "b6c484e3-d432-4c77-8df1-c12ac77b3146",
857987
"metadata": {},
858-
"outputs": [],
859-
"source": []
988+
"source": [
989+
"## Exercises\n",
990+
"Download from [this public repo](https://github.com/OpenNeuroDatasets/ds005588) the files \n",
991+
"`dataset_description.json`, `participants.tsv `.\n",
992+
"\n",
993+
"1) Read in the file `dataset_description.json` and print the field `Description`.\n",
994+
"2) Read in the file `participants.tsv ` and print the mean of all the values of the column `age`."
995+
]
860996
}
861997
],
862998
"metadata": {
@@ -875,7 +1011,7 @@
8751011
"name": "python",
8761012
"nbconvert_exporter": "python",
8771013
"pygments_lexer": "ipython3",
878-
"version": "3.10.13"
1014+
"version": "3.13.3"
8791015
}
8801016
},
8811017
"nbformat": 4,

book/06_imports.ipynb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,9 @@
196196
"```python\n",
197197
"# preprocessing.py\n",
198198
"\n",
199+
"\n",
200+
"# TODO: add missing imports to make it work\n",
201+
"\n",
199202
"def cleanup(data):\n",
200203
" ... \n",
201204
" \n",

book/_quarto.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ book:
4242
- 05_comprehensions.ipynb
4343
- lab_builtin_for_the_win.ipynb
4444
- 06_imports.ipynb
45-
- 061_dependencies.ipynb
4645
- 062_read_and_write.ipynb
46+
- 061_dependencies.ipynb
4747
- 063_common_data_formats.ipynb
4848
- 051_oop.ipynb
4949
- 08_numpy.ipynb

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@ dependencies = [
88
"jupyterlab>=4.4.10",
99
"jupyterlab-spellchecker>=0.8.4",
1010
"jupyterlab-vim>=4.1.4",
11+
"pandas>=2.3.3",
1112
]

0 commit comments

Comments
 (0)