Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions episodes/install.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
pak::pak("knitr")
pak::pak("xml2")
pak::pak("lintr")
library(knitr)
opts_chunk$set(comment = "")
library(xml2)
x <- read_xml("<foo> <bar> text <baz/> </bar> </foo>")
x
xml_name(x)
library(lintr)
available_tags(packages = "lintr")
8 changes: 4 additions & 4 deletions learners/mpox_data_cleaning_pipeline.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
In this document, we use the [{cleanepi}](https://epiverse-trace.github.io/cleanepi/)
package to clean and standardize a messy mpox (formerly known as Monkeypox)
dataset obtained from the [global.health](https://global.health/) platform.
The dataset is in a `csv` format available on this [link](https://mpox-2024.s3.eu-central-1.amazonaws.com/latest.csv).
The dataset is in a `csv` format available on this [link](https://ivcjkmyexc.execute-api.eu-central-1.amazonaws.com/web/url?folder=&file_name=latest.csv).

Check warning on line 15 in learners/mpox_data_cleaning_pipeline.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[uninformative link text]: [link](https://ivcjkmyexc.execute-api.eu-central-1.amazonaws.com/web/url?folder=&file_name=latest.csv)

We begin by importing the data into R and then utilize {cleanepi}
functionalities to perform the following operations in a streamlined manner:
Expand All @@ -28,7 +28,7 @@
```{r echo=TRUE, eval=FALSE}
cleaned_data <-
data.table::fread(
"https://mpox-2024.s3.eu-central-1.amazonaws.com/latest.csv") |>
"https://ivcjkmyexc.execute-api.eu-central-1.amazonaws.com/web/url?folder=&file_name=latest.csv") |>
cleanepi::replace_missing_values(na_strings = "") |>
cleanepi::remove_constants() |>
cleanepi::standardize_dates(error_tolerance = 1) |>
Expand Down Expand Up @@ -70,7 +70,7 @@

## Data laoding and inspection

::: {.callout-important}

Check warning on line 73 in learners/mpox_data_cleaning_pipeline.Rmd

View workflow job for this annotation

GitHub Actions / Build markdown source files if valid

[unknown div] callout-important
## Data download

The data file is quite large ($\sim 17$ MB) and may fail to download using
Expand All @@ -82,10 +82,10 @@
with both `View()` and `wakefield::table_heat()` functions to understand the
distribution of missing values within the dataset.

```{r echo=TRUE, cache=TRUE, eval=TRUE}
```{r echo=TRUE, eval=TRUE}
# import the data
data_in <- data.table::fread(
"https://mpox-2024.s3.eu-central-1.amazonaws.com/latest.csv"
"https://ivcjkmyexc.execute-api.eu-central-1.amazonaws.com/web/url?folder=&file_name=latest.csv"
)

# Visualise the distribution of the different types as well as missing data
Expand Down
Loading
Loading