ggplot2-visualizations/GGPLOT2 using RRT data.Rmd at main · Pablo-source/ggplot2-visualizations · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
title: "GGPLOT2 plots using RTT data"
author: "PLR"
date: "2023-01-23"
output:
  html_document:
    toc: yes
    toc_float: yes
    toc_collapsed: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## 1. Create new project and folder structure

We start this anaysis by creating a new project called **ggplot2-visualizations** and withing this project we use the above function to create the folder structure required to store input and output files

```{r Setup project folder structure,echo=TRUE}
library(here)

project_setup <-function(){

  if(!dir.exists("data")){dir.create(here("data"))}
  if(!dir.exists("plots")){dir.create(here("plots"))}
  if(!dir.exists("Archive")){dir.create(here("Archive"))}
  if(!dir.exists("Test")){dir.create(here("Test"))}

}

# Run code below to use function and create folder structure
project_setup()
```


## 2. Data used for building plots

We are going to explore how to build GGPLOT2 plots using RTT data.

[RTT_Waiting_Times]<https://www.england.nhs.uk/statistics/statistical-work-areas/rtt-waiting-times/rtt-data-2022-23/>

In particular we will be focusing on the England-level time series. The file can be found in the website below:

England-level time series:
- RTT Overview Timeseries Including Estimates for Missing Trusts Nov2022 (XLS,98k)

![RTT main stats website](/home/pablo/Documents/Pablo_ubuntu/Repos/Pablo_source_repo/ggplot2-visualizations/images/NHS_England_RTT_Waiting_Times_website.png)
To download the England level time series we can look at the URL address where the .csv file is saved:

In Firefox we do that by by clicking on the three lines menus on the top right corner, then selecting More tools and finally we selet  **Page Source**

```{r Source code including RTT data, echo=TRUE}
URL <-c('<p><strong>England-level time series</strong></p>
<p><a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2023/01/RTT-Overview-Timeseries-Including-Estimates-for-Missing-Trusts-Nov22-XLS-98K-63230.xlsx">RTT Overview Timeseries Including Estimates for Missing Trusts Nov22 (XLS, 98K)</a></p>')
URL
```
As can asee in the HTTP code above, the href tag will provide us with the .xlsx file that is published and uploaded to the RTT waiting times series [RTT_Waiting_Times]<https://www.england.nhs.uk/statistics/statistical-work-areas/rtt-waiting-times/rtt-data-2022-23/>


## 3. Load data into R from online CSV files

We will download an .xlsx file from an URL into our /data project sub-folder

Download RTT Time series data using this code below into our data sub-folder.

- Time series data
- CHROME: Inspect element. Copy outher HTML
- We can find the .csv data by looking at <href> tags in the HTML code
- <p><strong>England-level time series</strong></p>
- <p><a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2023/01/RTT-Overview-Timeseries-Including-Estimates-for-Missing-Trusts-Nov22-XLS-98K-63230.xlsx">RTT Overview Timeseries Including Estimates for Missing Trusts Nov22 (XLS, 98K)</a></p>'

Template to load xlsx file from URL
- urlFile = "https://docs.google.com/spreadsheets/d/1SF0PkBz9BR4yqiQ27Bt5OsD33Y8Rt5lh/edit?usp=sharing&ouid=107152468748636733235&rtpof=true&sd=true"
- xlsFile = "refugios_nayarit.xlsx"
- download.file(url=urlFile, destfile=xlsFile, mode="wb")


```{r First check your project folder}
library(here)
here()
```

```{r Download data from URL, echo=TRUE}
# Template to load xlsx file from URL
# urlFile = "https://docs.google.com/spreadsheets/d/1SF0PkBz9BR4yqiQ27Bt5OsD33Y8Rt5lh/edit?usp=sharing&ouid=107152468748636733235&rtpof=true&sd=true"
# xlsFile = "refugios_nayarit.xlsx"
# download.file(url=urlFile, destfile=xlsFile, mode="wb")

RTT_TS_data <- function() {

  if(!dir.exists("data")){dir.create("data")}

  # England-level time series
  # Download Excel file to a Project sub-folder called "data"
  # Created previously using an adhoc project structure function

  xlsFile = "RTT_TS_data.xlsx"

  download.file(
    url = 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2023/01/RTT-Overview-Timeseries-Including-Estimates-for-Missing-Trusts-Nov22-XLS-98K-63230.xlsx',
    destfile = here("data",xlsFile),
    mode ="wb"
  )

}
# Download RTT data function (no arguments)
RTT_TS_data()
```

As we can see below, the Excel file has been downloaded to our data folder

![Excel file downloaded](C:/R_WDP/ggplot2-visualizations/plots/Download XLSX from URL.png)

## 4. Import Excel file into R

### 4.1 List files in Downloads folder

We need to skip some rows and select just a handful of columns from the original Excel file we import into R. We use

First check available files in /data sub-folder. We want to import the Excel file called "RTT_TS_data.xlsx"

```{r list excel files data folder, echo=FALSE}
# pacman::p_load(readxl,here,dplyr,janitor)
library(readxl)
Excel_file <-list.files (path = "./data" ,pattern = "xlsx$")
Excel_file

# [1] "RTT_TS_data.xlsx"
Excel_tabs <- excel_sheets("./data/RTT_TS_data.xlsx")
Excel_tabs
```

### 4.1 Read in Excel file into R

Then we read this *RTT_TS_data.xlsx* file into R, using read_excel() file

```{r}
library(dplyr)
library(janitor)
RTT_Data_imported <- read_excel(here("data","RTT_TS_data.xlsx"),sheet = "Full Time Series",
                       skip = 10 , na = "-") %>%
  clean_names()

RTT_data_sub <- RTT_Data_imported %>%
  select(x2,total_waiting_mil)
RTT_data_sub

RTT_data_subset <- RTT_data_sub %>% select(Date = x2, Total_waiting = total_waiting_mil)
RTT_data_subset

RTT_data <- RTT_data_subset %>%
  select(Date, Total_waiting_M = Total_waiting) %>%
  na.omit()
RTT_data

```