Skip to content

Commit 1de04b1

Browse files
Merge pull request #136 from LibraryCarpentry/feat/add-quarto-episode
Add 'Reproducible Reports with Quarto' Episode
2 parents 0c6c8c4 + 9b031eb commit 1de04b1

4 files changed

Lines changed: 291 additions & 5 deletions

File tree

config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ episodes:
6464
- 02-starting-with-data.Rmd
6565
- 03-data-cleaning-and-transformation.Rmd
6666
- 04-data-viz-ggplot.Rmd
67+
- 05-reproducible-reports.Rmd
6768

6869
# Information for Learners
6970
learners:

episodes/01-intro-to-r.Rmd

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,8 @@ the Environment pane in the upper right, you will notice a new object,
9999
Here are some tips for naming objects in R:
100100

101101
- **Do not use names of functions that already exist in R:** There are some
102-
names that cannot be used because they are the names of fundamental functions in
103-
R (e.g., `if`, `else`, `for`, see
104-
[here](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Reserved.html)
102+
fundamental functions in R (e.g., `if`, `else`, `for`, see the
103+
[list of reserved words](https://stat.ethz.ch/R-manual/R-devel/library/base/html/Reserved.html)
105104
for a complete list. In general, even if it's allowed, it's best to not use
106105
other function names (e.g., `c`, `T`, `mean`, `data`, `df`, `weights`). If in
107106
doubt, check the help to see if the name is already in use.

episodes/04-data-viz-ggplot.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ library(lubridate) # load lubridate
6060
The`lubridate` package is installed with the tidyverse, but is not one of the core tidyverse packages loaded with `library(tidyverse)`, so it needs to be explicitly called. `lubridate` makes working with dates and times easier in R.
6161

6262
We also load the `books_reformatted` data we saved in the previous
63-
lesson. Here, we'll assign it to `books2`. You can create the reformatted data by running the codes from [here](https://librarycarpentry.github.io/lc-r/03-data-cleaning-and-transformation.html#exporting-data) or by loading the saved CSV file from previous episode.
63+
lesson. Here, we'll assign it to `books2`. You can create the reformatted data by running the [codes for exporting data](https://librarycarpentry.github.io/lc-r/03-data-cleaning-and-transformation.html#exporting-data) or by loading the saved CSV file from previous episode.
6464

6565
```{r, purl=FALSE, eval=FALSE}
6666
books2 <- read_csv("data_output/books_reformatted.csv") # load the data and assign it to books
@@ -109,7 +109,7 @@ ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()
109109
- The `mapping` argument defines the variables mapped to various aesthetics of the plot, e.g. the x and y axis.
110110
- The `geom_function` argument defines the type of plot, e.g. barplot, scatter plot, boxplot.
111111

112-
::::::::::::::::::::::::::::::::::::::::::::; callout
112+
:::::::::::::::::::::::::::::::::::::::::::: callout
113113

114114
## `ggplot2` versus `ggplot`
115115

Lines changed: 286 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
---
2+
title: Reproducible Reports with Quarto
3+
teaching: 45
4+
exercises: 15
5+
source: Rmd
6+
---
7+
8+
::::::::::::::::::::::::::::::::::::::: objectives
9+
10+
- Describe the value of reproducible reporting.
11+
- Create a new Quarto document (`.qmd`) in RStudio.
12+
- Use Markdown syntax to format text.
13+
- Create and run code chunks within a Quarto document.
14+
- Render a Quarto document to an HTML report.
15+
16+
::::::::::::::::::::::::::::::::::::::::::::::::::
17+
18+
:::::::::::::::::::::::::::::::::::::::: questions
19+
20+
- How can I combine my code, results, and narrative into a single document?
21+
- How can I automatically update my reports when my data changes?
22+
- What is Quarto and how does it differ from a standard R script?
23+
24+
::::::::::::::::::::::::::::::::::::::::::::::::::
25+
26+
```{r, include=FALSE}
27+
source("files/download_data.R")
28+
library(tidyverse)
29+
# Read raw data and apply cleaning steps from previous episodes
30+
books2 <- read_csv("data/books.csv") %>%
31+
rename(
32+
title = X245.ab,
33+
author = X245.c,
34+
callnumber = CALL...BIBLIO.,
35+
isbn = ISN,
36+
pubyear = X008.Date.One,
37+
subCollection = BCODE1,
38+
format = BCODE2,
39+
location = LOCATION,
40+
tot_chkout = TOT.CHKOUT,
41+
loutdate = LOUTDATE,
42+
subject = SUBJECT,
43+
callnumber2 = CALL...ITEM.
44+
) %>%
45+
mutate(
46+
pubyear = as.integer(pubyear),
47+
subCollection = recode(subCollection,
48+
"-" = "general collection",
49+
u = "government documents",
50+
r = "reference",
51+
b = "k-12 materials",
52+
j = "juvenile",
53+
s = "special collections",
54+
c = "computer files",
55+
t = "theses",
56+
a = "archives",
57+
z = "reserves"
58+
),
59+
format = recode(format,
60+
a = "book",
61+
e = "serial",
62+
w = "microform",
63+
s = "e-gov doc",
64+
o = "map",
65+
n = "database",
66+
k = "cd-rom",
67+
m = "image",
68+
"5" = "kit/object",
69+
"4" = "online video"
70+
)
71+
)
72+
```
73+
74+
## Introduction to Reproducible Reporting
75+
76+
So far, we have been writing code in `.R` scripts. This is excellent for data analysis, but what happens when you need to share your findings with a colleague or a library director? You might copy a plot into a Word document or an email, then type out your interpretation.
77+
78+
But what if the data changes next month? You would have to re-run your script, re-save the plot, copy it back into Word, and update your text. This manual process is prone to errors and tedious.
79+
80+
**Quarto** allows you to combine your code, its output (plots, tables), and your narrative text into a single document. When you "render" the document, R runs the code and produces a polished report (HTML, PDF, or Word) automatically.
81+
82+
## Creating a Quarto Document
83+
84+
To create a new Quarto document in RStudio:
85+
86+
1. Click the **File** menu.
87+
2. Select **New File** > **Quarto Document...**
88+
3. In the dialog box, give your document a **Title** (e.g., "Library Usage Report") and enter your name as **Author**.
89+
4. Ensure **HTML** is selected as the output format.
90+
5. Click **Create**.
91+
92+
RStudio will open a new file with some example content. Notice the file extension is `.qmd`.
93+
94+
::::::::::::::::::::::::::::::::::::::::: callout
95+
96+
## Quarto vs. RMarkdown
97+
98+
If you have used R before, you might be familiar with RMarkdown (`.Rmd`). Quarto (`.qmd`) is the next-generation version of RMarkdown. It works very similarly but supports more languages (like Python and Julia) and has better features for scientific publishing.
99+
100+
:::::::::::::::::::::::::::::::::::::::::
101+
102+
## Anatomy of a Quarto Document
103+
104+
A Quarto document has three main parts:
105+
106+
### 1. The YAML Header
107+
108+
At the very top, enclosed between two lines of `---`, is the **YAML Header**. This contains metadata about the document.
109+
110+
```yaml
111+
---
112+
title: "Library Usage Report"
113+
author: "Your Name"
114+
format: html
115+
---
116+
```
117+
118+
### 2. Markdown Text
119+
120+
The white space is where you write your narrative. You use **Markdown** syntax to format text.
121+
122+
- `**Bold**` for **bold text**
123+
- `*Italics*` for *italics*
124+
- `# Heading 1` for a main title
125+
- `## Heading 2` for a section title
126+
- `- List item` for bullet points
127+
128+
### 3. Code Chunks
129+
130+
Code chunks are where your R code lives. They start with ` ```{r} ` and end with ` ``` `.
131+
132+
````
133+
```{r}
134+
# This is a code chunk
135+
summary(cars)
136+
```
137+
````
138+
139+
You can insert a new chunk by clicking the **+C** button in the editor toolbar, or by pressing <kbd>Ctrl</kbd>+<kbd>Alt</kbd>+<kbd>I</kbd> (Windows/Linux) or <kbd>Cmd</kbd>+<kbd>Option</kbd>+<kbd>I</kbd> (Mac).
140+
141+
## Your First Report
142+
143+
Let's clean up the example file and create a report using our `books` data.
144+
145+
1. Delete everything in the file *below* the YAML header.
146+
2. Add a new **setup** code chunk to load our libraries and prepare the data.
147+
148+
```{{r}}
149+
#| label: setup
150+
#| include: false
151+
152+
library(tidyverse)
153+
154+
# Load data and rename columns for clarity
155+
books2 <- read_csv("data/books.csv") %>%
156+
rename(
157+
subCollection = BCODE1,
158+
tot_chkout = TOT.CHKOUT,
159+
format = BCODE2
160+
) %>%
161+
mutate(
162+
subCollection = recode(subCollection,
163+
"-" = "general collection",
164+
j = "juvenile",
165+
b = "k-12 materials"
166+
)
167+
)
168+
```
169+
170+
::::::::::::::::::::::::::::::::::::::::: callout
171+
172+
## Chunk Options
173+
174+
Notice the lines starting with `#|`. These are **chunk options**.
175+
- `#| label: setup` gives the chunk a name.
176+
- `#| include: false` runs the code but hides the code and output from the final report. This is great for loading data silently.
177+
178+
:::::::::::::::::::::::::::::::::::::::::
179+
180+
### Adding Analysis
181+
182+
Now, let's add a section header and some text.
183+
184+
```markdown
185+
## High Usage Items
186+
187+
We are analyzing items with more than 10 checkouts to understand circulation patterns across sub-collections.
188+
```
189+
190+
Next, insert a new code chunk and paste the plotting code we developed in the previous episode (ggplot2).
191+
192+
````
193+
```{r}
194+
#| label: plot-high-usage
195+
#| echo: false
196+
197+
# Filter for high usage
198+
booksHighUsage <- books2 %>%
199+
filter(!is.na(tot_chkout),
200+
tot_chkout > 10)
201+
202+
# Create the plot
203+
ggplot(data = booksHighUsage,
204+
aes(x = subCollection, y = tot_chkout)) +
205+
geom_boxplot(alpha = 0) +
206+
geom_jitter(alpha = 0.5, color = "tomato") +
207+
scale_y_log10() +
208+
labs(title = "Distribution of Checkouts by Sub-Collection",
209+
x = "Sub-Collection",
210+
y = "Total Checkouts (Log Scale)") +
211+
theme_bw() +
212+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
213+
```
214+
````
215+
216+
Setting `#| echo: false` will display the *plot* in the report, but hide the R *code* that generated it. This is often preferred for reports intended for non-coders.
217+
218+
## Rendering the Document
219+
220+
Now comes the magic. Click the **Render** button (blue arrow icon) at the top of the editor pane.
221+
222+
RStudio will:
223+
1. Run all your code chunks from scratch.
224+
2. Generate the plots and results.
225+
3. Combine them with your text.
226+
4. Create a new file named `library_usage_report.html` in your project folder.
227+
5. Open a preview of the report.
228+
229+
::::::::::::::::::::::::::::::::::::::: challenge
230+
231+
## Challenge: Add a Summary Table
232+
233+
1. Add a new header `## Summary Statistics` to your Quarto document.
234+
2. Insert a new code chunk.
235+
3. Write code to calculate the mean checkouts per format (Hint: use `group_by(format)` and `summarize()`).
236+
4. Render the document again to see your new table included in the report.
237+
238+
::::::::::::::: solution
239+
240+
## Solution
241+
242+
Add this to your document:
243+
244+
```markdown
245+
## Summary Statistics
246+
247+
The table below shows the average checkouts for each item format.
248+
```
249+
250+
````
251+
```{r}
252+
#| label: summary-table
253+
254+
books2 %>%
255+
group_by(format) %>%
256+
summarize(mean_checkouts = mean(tot_chkout, na.rm = TRUE)) %>%
257+
arrange(desc(mean_checkouts))
258+
```
259+
````
260+
261+
Render the document to see the updated report.
262+
263+
:::::::::::::::::::::::::
264+
265+
::::::::::::::::::::::::::::::::::::::::::::::::::
266+
267+
## Why This Matters
268+
269+
By using Quarto, your report is now **reproducible**.
270+
271+
If you download a new version of `books.csv` next month:
272+
1. Save it to your `data/` folder.
273+
2. Open your Quarto document.
274+
3. Click **Render**.
275+
276+
Your report will automatically update with the new data, creating a fresh plot and table without you having to copy-paste a single thing.
277+
278+
:::::::::::::::::::::::::::::::::::::::: keypoints
279+
280+
- **Quarto** allows you to mix code and text to create reproducible reports.
281+
- Use the **YAML header** to configure document metadata like title and output format.
282+
- **Code chunks** run R code and can display or hide input/output using options like `#| echo: false`.
283+
- **Rendering** the document executes the code and produces the final output (HTML, PDF, etc.).
284+
- This workflow saves time and reduces errors when reporting on data that changes over time.
285+
286+
::::::::::::::::::::::::::::::::::::::::::::::::::

0 commit comments

Comments
 (0)