You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 01-relational-database.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ Databases are designed to allow efficient querying against very large tables, mo
36
36
37
37
## What is a table?
38
38
39
-
As were have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represents the various variables contained within that observation.
39
+
As we have noted above, a single table is very much like a spreadsheet. It has rows and it has columns. A row represents a single observation and the columns represent the various variables contained within that observation.
40
40
Often one or more columns in a row will be designated as a 'primary key' This column or combination of columns can be used to uniquely identify a specific row in the table.
41
41
The columns typically have a name associated with them indicating the variable name. A column always represents the same variable for each row contained in the table. Because of this the data in each column will always be of the same *type*, such as an Integer or Text, of values for all of the rows in the table. Datatypes are discussed in the next section.
42
42
@@ -108,7 +108,7 @@ for these and use the built-in Date And Time Functions to manipulate them. We wi
108
108
109
109
## Why do tables have primary key columns?
110
110
111
-
Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage which adding rows to the table as you will not be allowed to add the same row (or a row with the same primary key) twice.
111
+
Whenever you create a table, you will have the option of designating one of the columns as the primary key column. The main property of the primary key column is that the values contained in it must uniquely identify that particular row. That is you cannot have duplicate primary keys. This can be an advantage when adding rows to the table as you will not be allowed to add the same row (or a row with the same primary key) twice.
112
112
113
113
The primary key column for a table is usually of type Integer although you could have Text. For example if you had a table of car information, then the "Reg\_No" column could be made the primary key as it can be used to uniquely identify a particular row in the table.
Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in you tables. We will discuss the meaning of NULL values in a table in a later episode.
55
+
Towards the bottom there is a section dealing with Field colors. You will see three bars below the word Text, to the right there are in fact three invisible bars for the Background. Click in the area for the Background color for NULL. A colour selector window will open, select Red. The bar will turn Red. This is now the default background cell colour that will be used to display NULL values in your tables. We will discuss the meaning of NULL values in a table in a later episode.
56
56
57
57
You can now close the preference window by clicking OK.
58
58
@@ -76,7 +76,7 @@ These are the same actions that are available from the toolbar at the top of the
If you select 'Browse Table', the data from the table is loaded into the 'Browse Data' pane from where it can be examined or filtered.
79
-
You can also select the table you wish to Browse directly from here.
79
+
You can also select the table you wish to browse directly from here.
80
80
81
81
There are options for 'New Record' and 'Delete Record'. As our interest is in analysing existing data not creating or deleting data, it is unlikely that you will want to use these options.
82
82
@@ -97,7 +97,7 @@ The second pane has the tabular results, and the bottom pane has a message indic
97
97
On the toolbar at the top there are eight buttons. Left to right they are:
98
98
99
99
- Open Tab (creates a new tab in the editor)
100
-
- Open SQL file (allows you to load a prepared file of SQL into the editor - the tab takes the name of he file)
100
+
- Open SQL file (allows you to load a prepared file of SQL into the editor - the tab takes the name of the file)
101
101
- Save SQL file (allows you to save the current contents of the active pane to the local file system)
102
102
- Execute SQL (Executes all of the SQL statements in the editor pane)
103
103
- Execute current line (Actually executes whatever is selected)
Copy file name to clipboardExpand all lines: 04-missing-data.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ exercises: 0
23
23
At the beginning of this lesson we noted that all database systems have the concept of a NULL value; Something which is missing and nothing is known about it.
24
24
25
25
In DB Browser we can choose how we want NULLs in a table to be displayed. When we had our initial look at DB Browser,
26
-
we used the `View | Preference` option to change the background colour of cells in a table which has a `NULL` values as **red**.
26
+
we used the `View | Preference` option to change the background colour of cells in a table which has `NULL` values as **red**.
27
27
The example below, using the 'Browse data' tab, shows a section of the Farms table in the SQL\_SAFI database showing column values which are `NULL`.
28
28
29
29
{alt='Farms NULLs'}
@@ -78,10 +78,10 @@ the value of `NULL` is appropriate.
78
78
79
79
## Dealing with missing data
80
80
81
-
There are several statistical techniques that can be used to allow for `NULL` values, which one you might will depend on what has caused the `NULL` value to be recorded.
81
+
There are several statistical techniques that can be used to allow for `NULL` values. Which one you might use will depend on what has caused the `NULL` value to be recorded.
82
82
83
83
You may want to change the `NULL` value to something else. For example if we knew that the `NULL` values in the `F14_items_owned` column actually meant that the Farmer had no possessions then we
84
-
might want to change the `NULL` values to '[]' to represent and empty list. We can do that in SQL with an `UPDATE` query.
84
+
might want to change the `NULL` values to '[]' to represent an empty list. We can do that in SQL with an `UPDATE` query.
85
85
86
86
The update query is shown below. We are not going to run it as it would change our data.
87
87
You need to be very sure of the effect you are going to have before you change data in this way.
Copy file name to clipboardExpand all lines: 05-creating-new-columns.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -65,7 +65,7 @@ Full details of the available built-in functions are available from the SQLite.o
65
65
66
66
We will look at some of the arithmetic and statistical functions when we deal with aggregations in a later lesson.
67
67
68
-
You may have noticed in the output from are last query that the number of decimal places can change from one row to another. In order to make the output
68
+
You may have noticed in the output from our last query that the number of decimal places can change from one row to another. In order to make the output
69
69
more tidy, we may wish to always produce the same number of decimal places, e.g. 2. We can do this using the `ROUND` function.
70
70
71
71
The `ROUND` function works in a similar way as its spreadsheet equivalent, you specify the value you wish to round and the required number of decimal places.
@@ -113,10 +113,10 @@ sometimes with different names.
113
113
| substr(a,b,c) | mid(a,b,c) |
114
114
| instr(a,b) | find(a,b) |
115
115
116
-
`instr` can be used to check a character or string of characters occurs within another string.
116
+
`instr` can be used to check if a character or string of characters occurs within another string.
117
117
`substr` can be used to extract a portion of a string based on a starting position and the number of characters required.
118
118
119
-
In the Farms table, the three columns A01\_interview\_date, A04\_start and A05\_end are all recognisable as a dates with the A04\_start and A05\_end also including times.
119
+
In the Farms table, the three columns A01\_interview\_date, A04\_start and A05\_end are all recognisable as dates with the A04\_start and A05\_end also including times.
120
120
These last two are automatically generated by the eSurvey software when the data is collected, i.e. they are automatically entered. The A01\_interview\_date however is manually input.
121
121
In all three cases however SQLite thinks that they are all just strings of characters.
122
122
We can confirm this by selecting the `Database Structure` tab and expanding the `Farms` entry and notice that the data type for all three columns is listed as 'TEXT'
@@ -268,7 +268,7 @@ ORDER BY year, month, day
268
268
```
269
269
270
270
By default the `ORDER BY` clause will sort in ascending order, smallest to
271
-
biggest; we can make this explicit by usingthe`ASC` keyword. Or if we want to
271
+
biggest; we can make this explicit by using the`ASC` keyword. Or if we want to
272
272
sort in descending order we can use the `DESC` keyword.
273
273
274
274
```sql
@@ -296,7 +296,7 @@ FROM Farms
296
296
;
297
297
```
298
298
299
-
There is a more general form which allows to to perform any kind of test.
299
+
There is a more general form which allows us to perform any kind of test.
Copy file name to clipboardExpand all lines: 06-aggregation.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ exercises: 10
22
22
23
23
## Using built-in statistical functions
24
24
25
-
Aggregate functions are used perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined
25
+
Aggregate functions are used to perform some kind of mathematical or statistical calculation across a group of rows. The rows in each group are determined
26
26
by the different values in a specified column or columns. Alternatively you can aggregate across the entire table.
27
27
28
28
If we wanted to know the minimum, average and maximum values of the 'A11\_years\_farm' column across the whole Farms table, we could write a query such as this;
@@ -76,7 +76,7 @@ We get
76
76
77
77
{alt='Villages'}
78
78
79
-
The problem with allowing free-form text quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely.
79
+
The problem with allowing free-form text may be quite obvious. Having two villages, one called 'Massequece' and the other called 'Massequese' is unlikely.
80
80
81
81
Detecting this type of problem in a large dataset can be very difficult if you are just 'eyeballing' the content. This small SQL query makes it very clear,
82
82
and in the OpenRefine lesson we provide approaches to detecting and correcting such errors. SQL is not the best tool for correcting this type of error.
@@ -110,7 +110,7 @@ ORDER BY A06_province, A07_district, A08_ward, A09_village;
110
110
111
111
## The `GROUP BY` clause to summarise data
112
112
113
-
Just knowing the combinations is of limited use. You really want to know **How many** of each of the values there are.
113
+
Just knowing the combinations is of limited use. You really want to know **how many** of each of the values there are.
114
114
To do this we use the `GROUP BY` clause.
115
115
116
116
```sql
@@ -124,7 +124,7 @@ This query tells us how many records in the table have each different value in t
124
124
125
125
In the first example of this episode, three aggregations were performed over the single column 'A11\_years\_farm'.
126
126
In addition to calculating multiple aggregation values over a single column, it is also possible to aggregate over multiple columns by specifying
127
-
them in all in the `SELECT` clause **and** the `GROUP BY` clause.
127
+
them all in the `SELECT` clause **and** the `GROUP BY` clause.
128
128
129
129
The grouping will take place based on the order of the columns listed in the `GROUP BY` clause. There will be one row returned for each unique combination of the columns mentioned in the `GROUP BY` clause
Copy file name to clipboardExpand all lines: 07-creating-tables-views.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,16 +77,16 @@ If any of the datatypes are not as expected or wanted we can change them.
77
77
In this particular case DB Browser correctly selected the datatypes. Notice that the `A01_interview_date` was allocated a datatype of 'TEXT'. This isn't a problem
78
78
as we have to use the Date and Time functions to manipulate dates anyway.
79
79
80
-
Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you modifying.
80
+
Notice that the bottom pane in the Window shows the SQL DDL statement that would create the table that you are modifying.
81
81
82
82
When you change one of the columns from TEXT to INTEGER, this is immediately reflected in the Create Table statement.
83
83
It is slightly misleading because in fact we are modifying an existing table and in SQL-speak, this would be an **Alter Table...** statement.
84
84
However it does illustrate quite well the fact that whatever you do in the GUI, it is essentially translated into an SQL statement and executed.
85
-
You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name.
85
+
You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name.
86
86
This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs.
87
87
It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table.
88
88
89
-
In addition to changing the data types there are several other options which can be set when you are creating of modifying a table.
89
+
In addition to changing the data types there are several other options which can be set when you are creating or modifying a table.
90
90
For our tables we don't need to make use of them but for completeness we will describe what they are;
91
91
92
92
**PK** - Or Primary Key, a unique identifier for the row. In the Farms table, there is an `Id` column which uniquely identifies a Farm.
@@ -98,7 +98,7 @@ This could act as a unique identifier for the row as a whole. We could mark thi
98
98
99
99
In real datasets missing values are quite common and we have already looked at ways of dealing with them when they occur in tables. If you were to **check** this box and the data did have missing values for this column, the record from the file would be rejected and the load of the file will fail.
100
100
101
-
**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column has to have unique values in it. Like Allow Null this is another way of providing some data validation as the data is imported. Although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option.
101
+
**U** - Or Unique. This allows you to say that the contents of the column, which is not the primary key column, has to have unique values in it. Like Allow Null, this is another way of providing some data validation as the data is imported (although it doesn't really apply with the DB Browser import wizard as the data is imported before you are allowed to set this option).
102
102
103
103
**Default** - This is used in conjunction with 'Not Null', if a value is not provided in the dataset, then if provided, the default value for that column will be used.
104
104
@@ -133,7 +133,7 @@ line added.
133
133
134
134
## Creating a table using an SQL command
135
135
136
-
You could copy and paste this definition into the SQL editor and if you change the table name before you ran it, you would create a new table with that name.
136
+
You could copy and paste this definition into the SQL editor and if you changed the table name before you ran it, you would create a new table with that name.
137
137
This new table would have no data in it. This is how the insert table wizard works. It uses the header row from your data to create a `CREATE TABLE` statement which it runs.
138
138
It then transforms each of the rows of data into SQL `INSERT INTO...` statements which it also runs to get the data into the table.
139
139
@@ -172,7 +172,7 @@ SELECT Id,
172
172
FROM Farms;
173
173
```
174
174
175
-
If we wanted to create a table from the Crops table which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this:
175
+
If we wanted to create a table from the Crops table, which contains only the rows where the D\_curr\_crop value was 'rice' we could use a query like this:
176
176
177
177
```sql
178
178
CREATETABLEcrops_riceAS
@@ -215,7 +215,7 @@ The advantage of using Views is that it allows you to restrict how you see the d
215
215
In the example we used above it may be far easier to work with only the 6 columns that we need from the full Farms table
216
216
rather than the full table with 61 columns.
217
217
218
-
A View isn't restricted to simple `SELECT` statements it can be the result of aggregations and joins as well.
218
+
A View isn't restricted to simple `SELECT` statements. It can be the result of aggregations and joins as well.
219
219
This can help reduce the complexity of queries based on the View and so aid readability.
0 commit comments