|
| 1 | +--- |
| 2 | +title: Reproducible Reports with Quarto |
| 3 | +teaching: 45 |
| 4 | +exercises: 15 |
| 5 | +source: Rmd |
| 6 | +--- |
| 7 | + |
| 8 | +::::::::::::::::::::::::::::::::::::::: objectives |
| 9 | + |
| 10 | +- Describe the value of reproducible reporting. |
| 11 | +- Create a new Quarto document (`.qmd`) in RStudio. |
| 12 | +- Use Markdown syntax to format text. |
| 13 | +- Create and run code chunks within a Quarto document. |
| 14 | +- Render a Quarto document to an HTML report. |
| 15 | + |
| 16 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 17 | + |
| 18 | +:::::::::::::::::::::::::::::::::::::::: questions |
| 19 | + |
| 20 | +- How can I combine my code, results, and narrative into a single document? |
| 21 | +- How can I automatically update my reports when my data changes? |
| 22 | +- What is Quarto and how does it differ from a standard R script? |
| 23 | + |
| 24 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 25 | + |
| 26 | +```{r, include=FALSE} |
| 27 | +source("files/download_data.R") |
| 28 | +library(tidyverse) |
| 29 | +# Read raw data and apply cleaning steps from previous episodes |
| 30 | +books2 <- read_csv("data/books.csv") %>% |
| 31 | + rename( |
| 32 | + title = X245.ab, |
| 33 | + author = X245.c, |
| 34 | + callnumber = CALL...BIBLIO., |
| 35 | + isbn = ISN, |
| 36 | + pubyear = X008.Date.One, |
| 37 | + subCollection = BCODE1, |
| 38 | + format = BCODE2, |
| 39 | + location = LOCATION, |
| 40 | + tot_chkout = TOT.CHKOUT, |
| 41 | + loutdate = LOUTDATE, |
| 42 | + subject = SUBJECT, |
| 43 | + callnumber2 = CALL...ITEM. |
| 44 | + ) %>% |
| 45 | + mutate( |
| 46 | + pubyear = as.integer(pubyear), |
| 47 | + subCollection = recode(subCollection, |
| 48 | + "-" = "general collection", |
| 49 | + u = "government documents", |
| 50 | + r = "reference", |
| 51 | + b = "k-12 materials", |
| 52 | + j = "juvenile", |
| 53 | + s = "special collections", |
| 54 | + c = "computer files", |
| 55 | + t = "theses", |
| 56 | + a = "archives", |
| 57 | + z = "reserves" |
| 58 | + ), |
| 59 | + format = recode(format, |
| 60 | + a = "book", |
| 61 | + e = "serial", |
| 62 | + w = "microform", |
| 63 | + s = "e-gov doc", |
| 64 | + o = "map", |
| 65 | + n = "database", |
| 66 | + k = "cd-rom", |
| 67 | + m = "image", |
| 68 | + "5" = "kit/object", |
| 69 | + "4" = "online video" |
| 70 | + ) |
| 71 | + ) |
| 72 | +``` |
| 73 | + |
| 74 | +## Introduction to Reproducible Reporting |
| 75 | + |
| 76 | +So far, we have been writing code in `.R` scripts. This is excellent for data analysis, but what happens when you need to share your findings with a colleague or a library director? You might copy a plot into a Word document or an email, then type out your interpretation. |
| 77 | + |
| 78 | +But what if the data changes next month? You would have to re-run your script, re-save the plot, copy it back into Word, and update your text. This manual process is prone to errors and tedious. |
| 79 | + |
| 80 | +**Quarto** allows you to combine your code, its output (plots, tables), and your narrative text into a single document. When you "render" the document, R runs the code and produces a polished report (HTML, PDF, or Word) automatically. |
| 81 | + |
| 82 | +## Creating a Quarto Document |
| 83 | + |
| 84 | +To create a new Quarto document in RStudio: |
| 85 | + |
| 86 | +1. Click the **File** menu. |
| 87 | +2. Select **New File** > **Quarto Document...** |
| 88 | +3. In the dialog box, give your document a **Title** (e.g., "Library Usage Report") and enter your name as **Author**. |
| 89 | +4. Ensure **HTML** is selected as the output format. |
| 90 | +5. Click **Create**. |
| 91 | + |
| 92 | +RStudio will open a new file with some example content. Notice the file extension is `.qmd`. |
| 93 | + |
| 94 | +::::::::::::::::::::::::::::::::::::::::: callout |
| 95 | + |
| 96 | +## Quarto vs. RMarkdown |
| 97 | + |
| 98 | +If you have used R before, you might be familiar with RMarkdown (`.Rmd`). Quarto (`.qmd`) is the next-generation version of RMarkdown. It works very similarly but supports more languages (like Python and Julia) and has better features for scientific publishing. |
| 99 | + |
| 100 | +::::::::::::::::::::::::::::::::::::::::: |
| 101 | + |
| 102 | +## Anatomy of a Quarto Document |
| 103 | + |
| 104 | +A Quarto document has three main parts: |
| 105 | + |
| 106 | +### 1. The YAML Header |
| 107 | + |
| 108 | +At the very top, enclosed between two lines of `---`, is the **YAML Header**. This contains metadata about the document. |
| 109 | + |
| 110 | +```yaml |
| 111 | +--- |
| 112 | +title: "Library Usage Report" |
| 113 | +author: "Your Name" |
| 114 | +format: html |
| 115 | +--- |
| 116 | +``` |
| 117 | + |
| 118 | +### 2. Markdown Text |
| 119 | + |
| 120 | +The white space is where you write your narrative. You use **Markdown** syntax to format text. |
| 121 | + |
| 122 | +- `**Bold**` for **bold text** |
| 123 | +- `*Italics*` for *italics* |
| 124 | +- `# Heading 1` for a main title |
| 125 | +- `## Heading 2` for a section title |
| 126 | +- `- List item` for bullet points |
| 127 | + |
| 128 | +### 3. Code Chunks |
| 129 | + |
| 130 | +Code chunks are where your R code lives. They start with ` ```{r} ` and end with ` ``` `. |
| 131 | + |
| 132 | +```` |
| 133 | +```{r} |
| 134 | +# This is a code chunk |
| 135 | +summary(cars) |
| 136 | +``` |
| 137 | +```` |
| 138 | + |
| 139 | +You can insert a new chunk by clicking the **+C** button in the editor toolbar, or by pressing <kbd>Ctrl</kbd>+<kbd>Alt</kbd>+<kbd>I</kbd> (Windows/Linux) or <kbd>Cmd</kbd>+<kbd>Option</kbd>+<kbd>I</kbd> (Mac). |
| 140 | + |
| 141 | +## Your First Report |
| 142 | + |
| 143 | +Let's clean up the example file and create a report using our `books` data. |
| 144 | + |
| 145 | +1. Delete everything in the file *below* the YAML header. |
| 146 | +2. Add a new **setup** code chunk to load our libraries and prepare the data. |
| 147 | + |
| 148 | +```{{r}} |
| 149 | +#| label: setup |
| 150 | +#| include: false |
| 151 | +
|
| 152 | +library(tidyverse) |
| 153 | +
|
| 154 | +# Load data and rename columns for clarity |
| 155 | +books2 <- read_csv("data/books.csv") %>% |
| 156 | + rename( |
| 157 | + subCollection = BCODE1, |
| 158 | + tot_chkout = TOT.CHKOUT, |
| 159 | + format = BCODE2 |
| 160 | + ) %>% |
| 161 | + mutate( |
| 162 | + subCollection = recode(subCollection, |
| 163 | + "-" = "general collection", |
| 164 | + j = "juvenile", |
| 165 | + b = "k-12 materials" |
| 166 | + ) |
| 167 | + ) |
| 168 | +``` |
| 169 | + |
| 170 | +::::::::::::::::::::::::::::::::::::::::: callout |
| 171 | + |
| 172 | +## Chunk Options |
| 173 | + |
| 174 | +Notice the lines starting with `#|`. These are **chunk options**. |
| 175 | +- `#| label: setup` gives the chunk a name. |
| 176 | +- `#| include: false` runs the code but hides the code and output from the final report. This is great for loading data silently. |
| 177 | + |
| 178 | +::::::::::::::::::::::::::::::::::::::::: |
| 179 | + |
| 180 | +### Adding Analysis |
| 181 | + |
| 182 | +Now, let's add a section header and some text. |
| 183 | + |
| 184 | +```markdown |
| 185 | +## High Usage Items |
| 186 | + |
| 187 | +We are analyzing items with more than 10 checkouts to understand circulation patterns across sub-collections. |
| 188 | +``` |
| 189 | + |
| 190 | +Next, insert a new code chunk and paste the plotting code we developed in the previous episode (ggplot2). |
| 191 | + |
| 192 | +```` |
| 193 | +```{r} |
| 194 | +#| label: plot-high-usage |
| 195 | +#| echo: false |
| 196 | +
|
| 197 | +# Filter for high usage |
| 198 | +booksHighUsage <- books2 %>% |
| 199 | + filter(!is.na(tot_chkout), |
| 200 | + tot_chkout > 10) |
| 201 | +
|
| 202 | +# Create the plot |
| 203 | +ggplot(data = booksHighUsage, |
| 204 | + aes(x = subCollection, y = tot_chkout)) + |
| 205 | + geom_boxplot(alpha = 0) + |
| 206 | + geom_jitter(alpha = 0.5, color = "tomato") + |
| 207 | + scale_y_log10() + |
| 208 | + labs(title = "Distribution of Checkouts by Sub-Collection", |
| 209 | + x = "Sub-Collection", |
| 210 | + y = "Total Checkouts (Log Scale)") + |
| 211 | + theme_bw() + |
| 212 | + theme(axis.text.x = element_text(angle = 45, hjust = 1)) |
| 213 | +``` |
| 214 | +```` |
| 215 | + |
| 216 | +Setting `#| echo: false` will display the *plot* in the report, but hide the R *code* that generated it. This is often preferred for reports intended for non-coders. |
| 217 | + |
| 218 | +## Rendering the Document |
| 219 | + |
| 220 | +Now comes the magic. Click the **Render** button (blue arrow icon) at the top of the editor pane. |
| 221 | + |
| 222 | +RStudio will: |
| 223 | +1. Run all your code chunks from scratch. |
| 224 | +2. Generate the plots and results. |
| 225 | +3. Combine them with your text. |
| 226 | +4. Create a new file named `library_usage_report.html` in your project folder. |
| 227 | +5. Open a preview of the report. |
| 228 | + |
| 229 | +::::::::::::::::::::::::::::::::::::::: challenge |
| 230 | + |
| 231 | +## Challenge: Add a Summary Table |
| 232 | + |
| 233 | +1. Add a new header `## Summary Statistics` to your Quarto document. |
| 234 | +2. Insert a new code chunk. |
| 235 | +3. Write code to calculate the mean checkouts per format (Hint: use `group_by(format)` and `summarize()`). |
| 236 | +4. Render the document again to see your new table included in the report. |
| 237 | + |
| 238 | +::::::::::::::: solution |
| 239 | + |
| 240 | +## Solution |
| 241 | + |
| 242 | +Add this to your document: |
| 243 | + |
| 244 | +```markdown |
| 245 | +## Summary Statistics |
| 246 | + |
| 247 | +The table below shows the average checkouts for each item format. |
| 248 | +``` |
| 249 | + |
| 250 | +```` |
| 251 | +```{r} |
| 252 | +#| label: summary-table |
| 253 | +
|
| 254 | +books2 %>% |
| 255 | + group_by(format) %>% |
| 256 | + summarize(mean_checkouts = mean(tot_chkout, na.rm = TRUE)) %>% |
| 257 | + arrange(desc(mean_checkouts)) |
| 258 | +``` |
| 259 | +```` |
| 260 | + |
| 261 | +Render the document to see the updated report. |
| 262 | + |
| 263 | +::::::::::::::::::::::::: |
| 264 | + |
| 265 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
| 266 | + |
| 267 | +## Why This Matters |
| 268 | + |
| 269 | +By using Quarto, your report is now **reproducible**. |
| 270 | + |
| 271 | +If you download a new version of `books.csv` next month: |
| 272 | +1. Save it to your `data/` folder. |
| 273 | +2. Open your Quarto document. |
| 274 | +3. Click **Render**. |
| 275 | + |
| 276 | +Your report will automatically update with the new data, creating a fresh plot and table without you having to copy-paste a single thing. |
| 277 | + |
| 278 | +:::::::::::::::::::::::::::::::::::::::: keypoints |
| 279 | + |
| 280 | +- **Quarto** allows you to mix code and text to create reproducible reports. |
| 281 | +- Use the **YAML header** to configure document metadata like title and output format. |
| 282 | +- **Code chunks** run R code and can display or hide input/output using options like `#| echo: false`. |
| 283 | +- **Rendering** the document executes the code and produces the final output (HTML, PDF, etc.). |
| 284 | +- This workflow saves time and reduces errors when reporting on data that changes over time. |
| 285 | + |
| 286 | +:::::::::::::::::::::::::::::::::::::::::::::::::: |
0 commit comments