Skip to content

Commit 8040a9d

Browse files
committed
differences for PR #69
1 parent 6f0b88d commit 8040a9d

12 files changed

Lines changed: 57 additions & 57 deletions

01-relational-database.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Databases are designed to allow efficient querying against very large tables, mo
3636

3737
## What is a table?
3838

39-
As were have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represents the various variables contained within that observation.
39+
As we have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represent the various variables contained within that observation.
4040
Often one or more columns in a row will be designated as a 'primary key' This column or combination of columns can be used to uniquely identify a specific row in the table.
4141
The columns typically have a name associated with them indicating the variable name. A column always represents the same variable for each row contained in the table. Because of this the data in each column will always be of the same *type*, such as an Integer or Text, of values for all of the rows in the table. Datatypes are discussed in the next section.
4242

@@ -108,7 +108,7 @@ for these and use the built-in Date And Time Functions to manipulate them. We wi
108108

109109
## Why do tables have primary key columns?
110110

111-
Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage which adding rows to the table as you will not be allowed to add the same row (or a row with the same primary key) twice.
111+
Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage when adding rows to the table as you will not be allowed to add the same row (or a row with the same primary key) twice.
112112

113113
The primary key column for a table is usually of type Integer although you could have Text. For example if you had a table of car information, then the "Reg\_No" column could be made the primary key as it can be used to uniquely identify a particular row in the table.
114114

02-db-browser.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ We will make a couple of initial changes to the layout of the screen. These will
5252

5353
![](fig/DB_Browser_run_2.png){alt='Data Browser Preferences'}
5454

55-
Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in you tables. We will discuss the meaning of NULL values in a table in a later episode.
55+
Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in your tables. We will discuss the meaning of NULL values in a table in a later episode.
5656

5757
You can now close the preference window by clicking OK.
5858

@@ -76,7 +76,7 @@ These are the same actions that are available from the toolbar at the top of the
7676
![](fig/DB_Browser_run_3.png){alt='Table Actions'}
7777

7878
If you select 'Browse Table', the data from the table is loaded into the 'Browse Data' pane from where it can be examined or filtered.
79-
You can also select the table you wish to Browse directly from here.
79+
You can also select the table you wish to browse directly from here.
8080

8181
There are options for 'New Record' and 'Delete Record'. As our interest is in analysing existing data not creating or deleting data, it is unlikely that you will want to use these options.
8282

@@ -97,7 +97,7 @@ The second pane has the tabular results, and the bottom pane has a message indic
9797
On the toolbar at the top there are eight buttons. Left to right they are:
9898

9999
- Open Tab (creates a new tab in the editor)
100-
- Open SQL file (allows you to load a prepared file of SQL into the editor - the tab takes the name of he file)
100+
- Open SQL file (allows you to load a prepared file of SQL into the editor - the tab takes the name of the file)
101101
- Save SQL file (allows you to save the current contents of the active pane to the local file system)
102102
- Execute SQL (Executes all of the SQL statements in the editor pane)
103103
- Execute current line (Actually executes whatever is selected)

03-select.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ WHERE B17_parents_liv = 'yes'
172172
;
173173
```
174174

175-
Notice that the columns being used in the `WHERE` clause do not need to returned as part of the `SELECT` clause.
175+
Notice that the columns being used in the `WHERE` clause do not need to be returned as part of the `SELECT` clause.
176176

177177
You can ensure the precedence of the operators by using brackets. Judicious use of brackets can also aid readability
178178

04-missing-data.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ exercises: 0
2323
At the beginning of this lesson we noted that all database systems have the concept of a NULL value; Something which is missing and nothing is known about it.
2424

2525
In DB Browser we can choose how we want NULLs in a table to be displayed. When we had our initial look at DB Browser,
26-
we used the `View | Preference` option to change the background colour of cells in a table which has a `NULL` values as **red**.
26+
we used the `View | Preference` option to change the background colour of cells in a table which has `NULL` values as **red**.
2727
The example below, using the 'Browse data' tab, shows a section of the Farms table in the SQL\_SAFI database showing column values which are `NULL`.
2828

2929
![](fig/SQL_04_Nulls_01.png){alt='Farms NULLs'}
@@ -78,10 +78,10 @@ the value of `NULL` is appropriate.
7878

7979
## Dealing with missing data
8080

81-
There are several statistical techniques that can be used to allow for `NULL` values, which one you might will depend on what has caused the `NULL` value to be recorded.
81+
There are several statistical techniques that can be used to allow for `NULL` values. Which one you might use will depend on what has caused the `NULL` value to be recorded.
8282

8383
You may want to change the `NULL` value to something else. For example if we knew that the `NULL` values in the `F14_items_owned` column actually meant that the Farmer had no possessions then we
84-
might want to change the `NULL` values to '[]' to represent and empty list. We can do that in SQL with an `UPDATE` query.
84+
might want to change the `NULL` values to '[]' to represent an empty list. We can do that in SQL with an `UPDATE` query.
8585

8686
The update query is shown below. We are not going to run it as it would change our data.
8787
You need to be very sure of the effect you are going to have before you change data in this way.

05-creating-new-columns.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Full details of the available built-in functions are available from the SQLite.o
6565

6666
We will look at some of the arithmetic and statistical functions when we deal with aggregations in a later lesson.
6767

68-
You may have noticed in the output from are last query that the number of decimal places can change from one row to another. In order to make the output
68+
You may have noticed in the output from our last query that the number of decimal places can change from one row to another. In order to make the output
6969
more tidy, we may wish to always produce the same number of decimal places, e.g. 2. We can do this using the `ROUND` function.
7070

7171
The `ROUND` function works in a similar way as its spreadsheet equivalent, you specify the value you wish to round and the required number of decimal places.
@@ -113,10 +113,10 @@ sometimes with different names.
113113
| substr(a,b,c) | mid(a,b,c) |
114114
| instr(a,b) | find(a,b) |
115115

116-
`instr` can be used to check a character or string of characters occurs within another string.
116+
`instr` can be used to check if a character or string of characters occurs within another string.
117117
`substr` can be used to extract a portion of a string based on a starting position and the number of characters required.
118118

119-
In the Farms table, the three columns A01\_interview\_date, A04\_start and A05\_end are all recognisable as a dates with the A04\_start and A05\_end also including times.
119+
In the Farms table, the three columns A01\_interview\_date, A04\_start and A05\_end are all recognisable as dates with the A04\_start and A05\_end also including times.
120120
These last two are automatically generated by the eSurvey software when the data is collected, i.e. they are automatically entered. The A01\_interview\_date however is manually input.
121121
In all three cases however SQLite thinks that they are all just strings of characters.
122122
We can confirm this by selecting the `Database Structure` tab and expanding the `Farms` entry and notice that the data type for all three columns is listed as 'TEXT'
@@ -268,7 +268,7 @@ ORDER BY year, month, day
268268
```
269269

270270
By default the `ORDER BY` clause will sort in ascending order, smallest to
271-
biggest; we can make this explicit by usingthe `ASC` keyword. Or if we want to
271+
biggest; we can make this explicit by using the `ASC` keyword. Or if we want to
272272
sort in descending order we can use the `DESC` keyword.
273273

274274
```sql
@@ -296,7 +296,7 @@ FROM Farms
296296
;
297297
```
298298

299-
There is a more general form which allows to to perform any kind of test.
299+
There is a more general form which allows us to perform any kind of test.
300300

301301
## Using SQL syntax to create ‘binned' values
302302

06-aggregation.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ exercises: 10
2222

2323
## Using built-in statistical functions
2424

25-
Aggregate functions are used perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined
25+
Aggregate functions are used to perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined
2626
by the different values in a specified column or columns. Alternatively you can aggregate across the entire table.
2727

2828
If we wanted to know the minimum, average and maximum values of the 'A11\_years\_farm' column across the whole Farms table, we could write a query such as this;
@@ -76,7 +76,7 @@ We get
7676

7777
![](fig/SQL_06_villages.png){alt='Villages'}
7878

79-
The problem with allowing free-form text quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely.
79+
The problem with allowing free-form text may be quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely.
8080

8181
Detecting this type of problem in a large dataset can be very difficult if you are just 'eyeballing' the content. This small SQL query makes it very clear,
8282
and in the OpenRefine lesson we provide approaches to detecting and correcting such errors. SQL is not the best tool for correcting this type of error.
@@ -110,7 +110,7 @@ ORDER BY A06_province, A07_district, A08_ward, A09_village;
110110

111111
## The `GROUP BY` clause to summarise data
112112

113-
Just knowing the combinations is of limited use. You really want to know **How many** of each of the values there are.
113+
Just knowing the combinations is of limited use. You really want to know **how many** of each of the values there are.
114114
To do this we use the `GROUP BY` clause.
115115

116116
```sql
@@ -124,7 +124,7 @@ This query tells us how many records in the table have each different value in t
124124

125125
In the first example of this episode, three aggregations were performed over the single column 'A11\_years\_farm'.
126126
In addition to calculating multiple aggregation values over a single column, it is also possible to aggregate over multiple columns by specifying
127-
them in all in the `SELECT` clause **and** the `GROUP BY` clause.
127+
them all in the `SELECT` clause **and** the `GROUP BY` clause.
128128

129129
The grouping will take place based on the order of the columns listed in the `GROUP BY` clause. There will be one row returned for each unique combination of the columns mentioned in the `GROUP BY` clause
130130

07-creating-tables-views.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,16 +77,16 @@ If any of the datatypes are not as expected or wanted we can change them.
7777
In this particular case DB Browser correctly selected the datatypes. Notice that the `A01_interview_date` was allocated a datatype of 'TEXT'. This isn't a problem
7878
as we have to use the Date and Time functions to manipulate dates anyway.
7979

80-
Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you modifying.
80+
Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you are modifying.
8181

8282
When you change one of the columns from TEXT to INTEGER, this is immediately reflected in the Create Table statement.
8383
It is slightly misleading because in fact we are modifying an existing table and in SQL-speak, this would be an **Alter Table...** statement.
8484
However it does illustrate quite well the fact that whatever you do in the GUI, it is essentially translated into an SQL statement and executed.
85-
You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name.
85+
You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name.
8686
This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs.
8787
It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table.
8888

89-
In addition to changing the data types there are several other options which can be set when you are creating of modifying a table.
89+
In addition to changing the data types there are several other options which can be set when you are creating or modifying a table.
9090
For our tables we don't need to make use of them but for completeness we will describe what they are;
9191

9292
**PK** - Or Primary Key, a unique identifier for the row. In the Farms table, there is an `Id` column which uniquely identifies a Farm.
@@ -98,7 +98,7 @@ This could act as a unique identifier for the row as a whole. We could mark thi
9898

9999
In real datasets missing values are quite common and we have already looked at ways of dealing with them when they occur in tables. If you were to **check** this box and the data did have missing values for this column, the record from the file would be rejected and the load of the file will fail.
100100

101-
**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column has to have unique values in it. Like Allow Null this is another way of providing some data validation as the data is imported. Although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option.
101+
**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column, has to have unique values in it. Like Allow Null, this is another way of providing some data validation as the data is imported (although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option).
102102

103103
**Default** - This is used in conjunction with 'Not Null', if a value is not provided in the dataset, then if provided, the default value for that column will be used.
104104

@@ -133,7 +133,7 @@ line added.
133133

134134
## Creating a table using an SQL command
135135

136-
You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name.
136+
You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name.
137137
This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs.
138138
It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table.
139139

@@ -172,7 +172,7 @@ SELECT Id,
172172
FROM Farms;
173173
```
174174

175-
If we wanted to create a table from the Crops table which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this:
175+
If we wanted to create a table from the Crops table, which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this:
176176

177177
```sql
178178
CREATE TABLE crops_rice AS
@@ -215,7 +215,7 @@ The advantage of using Views is that it allows you to restrict how you see the d
215215
In the example we used above it may be far easier to work with only the 6 columns that we need from the full Farms table
216216
rather than the full table with 61 columns.
217217

218-
A View isn't restricted to simple `SELECT` statements it can be the result of aggregations and joins as well.
218+
A View isn't restricted to simple `SELECT` statements. It can be the result of aggregations and joins as well.
219219
This can help reduce the complexity of queries based on the View and so aid readability.
220220

221221
:::::::::::::::::::::::::::::::::::::::: keypoints

0 commit comments

Comments
 (0)