Skip to content

Commit 17f46d6

Browse files
committed
minor re-org
1 parent 036dd06 commit 17f46d6

5 files changed

Lines changed: 14 additions & 82 deletions

File tree

_toc.yml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# Table of contents
2-
# Learn more at https://jupyterbook.org/customize/toc.html
32

43
format: jb-book
54
root: welcome
@@ -31,21 +30,21 @@ parts:
3130
- caption: Transform
3231
numbered: true
3332
chapters:
34-
- file: joins
3533
- file: boolean-data
3634
- file: numbers
3735
- file: strings
3836
- file: regex
3937
- file: categorical-data
4038
- file: dates-and-times
4139
- file: missing-values
40+
- file: joins
4241
- caption: Import
4342
numbered: true
4443
chapters:
4544
- file: spreadsheets
46-
- file: webscraping-and-apis
47-
- file: rectangling
4845
- file: databases
46+
- file: rectangling
47+
- file: webscraping-and-apis
4948
- caption: Programme
5049
numbered: true
5150
chapters:

introduction.ipynb

Lines changed: 6 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -122,9 +122,13 @@
122122
"\n",
123123
"### R, Julia, and friends\n",
124124
"\n",
125-
"In this book, you won't learn anything about R, Julia, or any other programming language useful for data science. This isn't because we think these tools are bad. They're not! And in practice, most data science teams use a mix of languages, typically R and Python. However, you may find it easier to learn one tool at a time.\n",
125+
"In this book, you won't learn anything about R, Julia, or any other programming language useful for data science. This isn't because we think these tools are bad. They're not! And in practice, most data science teams use a mix of languages. However, you may find it easier to learn one set of tools at a time. In this book you'll see what we think of as the three critical tools for data science:\n",
126126
"\n",
127-
"This book uses Python, which is usually ranked as the first or second most popular programming language in the world and, just as importantly, it’s also one of the easiest to learn. It’s a general purpose language, which means it can perform a wide range of tasks. This combination of features is why people say Python has a low floor and a high ceiling. It’s also very versatile; the joke goes that Python is the 2nd best language at everything, and there’s some truth to that (although Python is 1st best at some tasks, like machine learning). But a language that covers such a lot of ground is also very useful; and Python is widely used across industry, academia, and the public sector, and is often taught in schools too.\n",
127+
"- Python\n",
128+
"- SQL\n",
129+
"- command line scripting\n",
130+
"\n",
131+
"This book predominantly uses Python, which is usually ranked as the first or second most popular programming language in the world and, just as importantly, it’s also one of the easiest to learn. It’s a general purpose language, which means it can perform a wide range of tasks. This combination of features is why people say Python has a low floor and a high ceiling. It’s also very versatile; the joke goes that Python is the 2nd best language at everything, and there’s some truth to that (although Python is 1st best at some tasks, like machine learning). But a language that covers such a lot of ground is also very useful; and Python is widely used across industry, academia, and the public sector, and is often taught in schools too.\n",
128132
"\n",
129133
"We think Python is a great place to start your data science journey because it is the most popular tool for data science and programming more generally, with a large community behind it.\n",
130134
"\n",
@@ -150,76 +154,6 @@
150154
"print(\"Compiled with Python version:\", sys.version)"
151155
]
152156
},
153-
{
154-
"attachments": {},
155-
"cell_type": "markdown",
156-
"id": "9a885521",
157-
"metadata": {},
158-
"source": [
159-
"## Running Python code\n",
160-
"\n",
161-
"Now you will create and run your first code. If you get stuck, there's a more in-depth tutorial over at the [VS Code documentation](https://code.visualstudio.com/docs/python/python-tutorial).\n",
162-
"\n",
163-
"Create a new folder for your work (perhaps named 'python4DS', no white space), open that folder with Visual Studio Code and create a new file, naming it `hello_world.py`. The file extension, `.py`, is very important as it implicitly tells Visual Studio Code that this is a Python script. In the Visual Studio Code editor, add a single line to the file:\n",
164-
"\n",
165-
"```python\n",
166-
"print('Hello World!')\n",
167-
"```\n",
168-
"\n",
169-
"Save the file.\n",
170-
"\n",
171-
"If you named this file with the extension `.py` then VS Code will recognise that it is Python code and you should see the name and version of Python pop up in the blue bar at the bottom of your VS Code window. Make sure that the version of Python displayed here is the Anaconda version that you just installed rather than one that comes built-in with your operating system (this is particularly an issue on Mac). If you have a fresh install of Anaconda's distribution of Python, you'll probably see something like `Python 3.9 64-bit ('base': conda)`. To change which Python version your code uses, click on the version shown in the blue bar and select the version you want. If you've just changed Python version, it can be a good idea to restart VS Code so that all the versions of Python on your system are picked up by it.\n",
172-
"\n",
173-
"When you press save, you may get messages about installing extra packages or making Pylance your default language server; just go with VS Code's suggestions here, except the one about the terminal and conda, which you can say no to.\n",
174-
"\n",
175-
"Alright, shall we actually run some code? Select/highlight the `print('Hello world!')` text you typed in the file and right-click to bring up some options including 'Run Selection/Line in Terminal' and `Run Selection/Line in Interactive Window'. Because VS Code is a richly featured IDE, there are lots of options for how to run the file. Let's try both of the main ways: via the interactive window and using the \"terminal\" (more on what that is later).\n",
176-
"\n",
177-
"The interactive window is a convenient and flexible way to run code that you have open in a script or that you type directly into the interactive window code box. The interactive window will 'remember' any variables that have been assigned (for examples, code statements like `x = 5`), whether they came from running some lines in your script or from you typing them in directly. Working with the interactive window will feel familiar to anyone who has used Stata, Matlab, or R, and is much more suited to the way economists tend to work because it doesn't require you to write the whole script, start to finish, ahead of time. Instead, you can jam, changing code as you go, (re-)running it line by line.\n",
178-
"\n",
179-
"To run the code in an interactive window, **right-click and select 'Run Selection/Line in Interactive Window'**. This should cause a new 'interactive' panel to appear within Visual Studio Code, and only the selected line will execute within it. At this point, you may see a message about Visual Studio Code's default behaviour when you press <kbd>Shift</kbd> + <kbd>Enter</kbd>; for this book, it's good to have <kbd>Shift</kbd> + <kbd>Enter</kbd> default to running a line in the interactive window. The box below has instructions for how to ensure this always happens.\n",
180-
"\n",
181-
"```{admonition} Make code run in the interactive window by default\n",
182-
":class: dropdown\n",
183-
"\n",
184-
"Open up Visual Studio Code and go to settings (click on the cog in the bottom left-hand corner, then click settings).\n",
185-
"\n",
186-
"Type 'python send' into the search box. Depending on your configuration and Visual Studio Code version, you will either see 'Python › Data Science: Send Selection To Interactive Window' or 'Jupyter: Send Selection To Interactive Window'. Make sure that there is a tick in the box.\n",
187-
"\n",
188-
"This will ensure that when you hit shift+enter on code scripts, it will execute your code in Visual Studio's interactive window (starting a new window if necessary).\n",
189-
"```\n",
190-
"\n",
191-
"Let's make more use of the interactive window. At the bottom of it, there is a box that says 'Type code here and press shift-enter to run'. Go ahead and type `print('Hello World!')` directly in there to achieve the same effect as running the line from your script. Also, any variables you run in the interactive window (from your script or directly by entering them in the box) will persist.\n",
192-
"\n",
193-
"To see how variables persist, type `hello_string = 'Hello World!'` into the interactive window's code entry box and hit shift-enter. If you now type `hello_string` and hit shift+enter, you will see the contents of the variable you just created. You can also click the grid symbol at the top of the interactive window (between the stop symbol and the save file symbol); this is the variable explorer and will pop open a panel showing all of the variables you've created in this interactive session. You should see one called `hello_string` of type `str` with a value `Hello World!`.\n",
194-
"\n",
195-
"This shows the two ways of working with the interactive window--running (segments) from a script, or writing code directly in the entry box.\n",
196-
"\n",
197-
"```{admonition} Start interactive windows and terminals within your project directory\n",
198-
":class: dropdown\n",
199-
"In Visual Studio Code, you can ensure that the interactive window starts in the root directory of your project by setting \"Jupyter: Notebook File Root\" to `${workspaceFolder}` in the Settings menu. For the integrated command line, change \"Terminal › Integrated: Cwd\" to `${workspaceFolder}` too.\n",
200-
"```\n",
201-
"\n",
202-
"To run code the other way, in the terminal, right-click and select 'Run Python file in terminal'. This will bring up a new panel (called a terminal) *within* Visual Studio Code that runs your entire script from top to bottom-and you should see 'Hello World!' pop up! Although we're trying out running code in the terminal, the typical economics workflow would be to work with the interactive window.\n",
203-
"\n",
204-
"```{admonition} Exercise\n",
205-
"Create a new script that, when run, prints \"Welcome to Python for Data Science\" and run it in an interactive window.\n",
206-
"```\n"
207-
]
208-
},
209-
{
210-
"attachments": {},
211-
"cell_type": "markdown",
212-
"id": "4f2457a2",
213-
"metadata": {},
214-
"source": [
215-
"## Alternative ways to run the code from the book\n",
216-
"\n",
217-
"As well as following this book using your own computer or on the cloud via Github Codespaces, you can run the code online through a few other options. The first is the easiest to get started with.\n",
218-
"\n",
219-
"1. [Google Colab notebooks](https://research.google.com/colaboratory/). Free for most use. You can launch most pages in this book interactively by using the 'Colab' button under the rocket symbol at the top of the page. It will be in the form of a notebook (which mixes code and text) rather than a script (.py file) but the code you write is the same.\n",
220-
"2. [Gitpod Workspace](https://www.gitpod.io/). An alternative to Codespaces. This is a remote, cloud-based version of Visual Studio Code with Python installed and will run Python scripts. Note that the free tier covers 50 hours per month."
221-
]
222-
},
223157
{
224158
"attachments": {},
225159
"cell_type": "markdown",

rectangling.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@
66
"metadata": {},
77
"source": [
88
"(rectangling)=\n",
9-
"# Rectangling\n",
9+
"# Nested Data\n",
1010
"\n",
1111
"## Introduction\n",
1212
"\n",
13-
"In this chapter, you'll learn the art of data **rectangling**, taking data that is fundamentally tree-like and converting it into a rectangular data frames made up of rows and columns. This is important because hierarchical data is surprisingly common, especially when working with data that comes from a web API (such as you will see in {ref}`webscraping-and-apis`).\n",
13+
"In this chapter, you'll learn about **nested data**, working with data that is fundamentally tree-like and (often) converting it into a rectangular data frames made up of rows and columns. This is important because nested data is surprisingly common, especially when working with data that comes from a web API (such as you will see in {ref}`webscraping-and-apis`).\n",
1414
"\n",
1515
"To learn about rectangling, you'll first learn about lists, dictionaries, and the JSON format, as these are the data structures that are most often used to work with hierarchical data in Python. Then you'll learn about some functions that can help you turn hierarchical data into 'tidy' data in columns and rows. We'll then show you a few case studies, applying these simple function multiple times to solve real complex problems.\n"
1616
]

visualise.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ Let's look at each in a bit more detail.
3838
The first of the three kinds is *exploratory data visualisation*, and it's the kind that you do when you're looking and data and trying to understand it. Just plotting the data is a really good strategy for getting a feel for any issues there might be. This is perhaps most famously demonstrated by Anscombe's quartet: four different datasets with the same mean, standard deviation, and correlation but very different data distributions.
3939

4040
```{code-cell} ipython3
41+
---
42+
tags: [remove-input]
43+
---
4144
import numpy as np
4245
import pandas as pd
4346
import matplotlib.pyplot as plt

workflow-help.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,3 @@ df
6666
- **Code**: copy and paste the minimal reproducible example code (including the packages, as noted above). Make sure you've used spaces and your variable names are concise, yet informative. Use comments to indicate where your problem lies. Do your best to remove everything that is not related to the problem. Finally, the shorter your code is, the easier it is to understand, and the easier it is to fix.
6767

6868
Finish by checking that you have actually made a reproducible example by starting a fresh Python session and copying and pasting your reprex in.
69-
70-
## Investing in yourself
71-
72-
You should also spend some time preparing yourself to solve problems before they occur. Investing a little time in learning Python each day will pay off in the long run!

0 commit comments

Comments
 (0)