diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index 5207f5e..31729a7 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -39,7 +39,7 @@ jobs: steps: - name: Checkout ACRO - uses: actions/checkout@v4 + uses: actions/checkout@v5 - name: Setup Python uses: actions/setup-python@v5 diff --git a/.github/workflows/pkgdown.yaml b/.github/workflows/pkgdown.yaml index 4570f4f..7f20b34 100644 --- a/.github/workflows/pkgdown.yaml +++ b/.github/workflows/pkgdown.yaml @@ -22,7 +22,7 @@ jobs: steps: - name: Checkout - uses: actions/checkout@v4 + uses: actions/checkout@v5 - name: Setup Python uses: actions/setup-python@v5 diff --git a/.github/workflows/test-coverage.yaml b/.github/workflows/test-coverage.yaml index ab15d9e..c540035 100644 --- a/.github/workflows/test-coverage.yaml +++ b/.github/workflows/test-coverage.yaml @@ -18,7 +18,7 @@ jobs: steps: - name: Checkout ACRO - uses: actions/checkout@v4 + uses: actions/checkout@v5 - name: Setup Python uses: actions/setup-python@v5 diff --git a/Installation Guides.md b/Installation Guides.md index fea17c8..19e2a96 100644 --- a/Installation Guides.md +++ b/Installation Guides.md @@ -6,36 +6,36 @@ Keeping this comprehensive will require input from the community. So please email sacro.contact@uwe.ac.uk, or [raise an issue on the GitHub repository](https://github.com/AI-SDC/ACRO-R/issues/new/choose) if: - you have a setting that is not covered, or - - the steps outlined below do not work for you, + - the steps outlined below do not work for you, -**Please note**: most of the scenarios below assume that +**Please note**: most of the scenarios below assume that - you have a working version of Python 3 (version 3.9 or higher) on your system - you are able to access a terminal or command prompt to write and execute some commands. --- ## Step 1 create a python virtual environment and install the base python package *acro* -**In every case** we recommend that you create what is called a 'python virtual environment' called **r-acro**. -Virtual environments (*venv's*) are recommended best practice. -This is because they isolate the impact of any changes you make in one venv - such as adding or updating a package- from the rest of your system. +**In every case** we recommend that you create what is called a 'python virtual environment' called **r-acro**. +Virtual environments (*venv's*) are recommended best practice. +This is because they isolate the impact of any changes you make in one venv - such as adding or updating a package- from the rest of your system. There are many tutorials available on the web if you get stuck. We do not endorse any particular site, but here are some examples: - [an overview with examples for windows/linux/mac](https://python.land/virtual-environments/virtualenv) - [another that also contains instructions for VSCode and Pycharm](https://realpython.com/python-virtual-environments-a-primer/) - -**For individual users** we suggest that you do this in your home directory where you should have write permission. + +**For individual users** we suggest that you do this in your home directory where you should have write permission. **To install site-wide** we assume you have access rights and know where your organisation's preferred locations are (for example, this might be ```/usr/local``` on a linux system). ### Make a dedicated virtual environment You can make a new virtual environment via: - the Anaconda GUI interface to the conda system -- command line access - by opening a terminal or command prompt and entering the command: +- command line access - by opening a terminal or command prompt and entering the command: ```sh conda create --n r-acro ``` -if you have a version of conda installed or +if you have a version of conda installed or ```sh python -m venv ./r-acro ``` @@ -74,16 +74,16 @@ source r-acro/bin/activate #you should see the your command prompt change to show (r-acro) python -m pip install acro #assuming this completes successfuly you can now exit the virtual environment -deactivate +deactivate ``` --- ## Step 2 Install the R packages *reticulate* and *acro* The *reticulate* package is the industry-standard method for supporting communications between R and Python. -It provides the `plumbing` between the R `front-end' +It provides the `plumbing` between the R `front-end' -These commands should work whether you are +These commands should work whether you are - working on a machine outside the TRE: in which case packages should install from a mirror of the CRAN service - working on a machine inside a TRE: in which case the administrator should have set up a local mirror of approved packages from CRAN @@ -128,7 +128,7 @@ If you follow the menu items from ```Tools->Project Options ->Python``` or ```To library(reticulate) library("acro)" ``` - + ### Option 3 - Editing your personal R preferences In your home directory create (or edit) the file ```.Rprofile``` file, adding the lines @@ -139,5 +139,5 @@ Sys.setenv(RETICULATE_PYTHON_ENV=file.path(Sys.getenv("USERPROFILE"),"r-acro")) -### Option 4- Making site-wide changes +### Option 4- Making site-wide changes You can also edit the [site-wide Rprofile]() file to add these global environment variables, using replacing *~/r-acro* with the path to wherever you created the dedicated virtual environment. diff --git a/example-notebook.Rmd b/example-notebook.Rmd index 4ba9c8d..ff3a06a 100644 --- a/example-notebook.Rmd +++ b/example-notebook.Rmd @@ -25,12 +25,12 @@ acro_init() ``` ### Load the data -- The dataset used in this example notebook is the nursery dataset from OpenML. -- The code below reads the data from a folder called "nursery_data" which we assume is at the same level as the folder where you are working. +- The dataset used in this example notebook is the nursery dataset from OpenML. +- The code below reads the data from a folder called "nursery_data" which we assume is at the same level as the folder where you are working. - The path might need to be changed if the data has been downloaded and stored elsewhere. ```{r} -data = farff::readARFF("nursery_data/nursery.arff") -data = as.data.frame(data) +data <- farff::readARFF("nursery_data/nursery.arff") +data <- as.data.frame(data) names(data)[names(data) == "class"] <- "recommend" ``` @@ -38,8 +38,8 @@ names(data)[names(data) == "class"] <- "recommend" - Convert the children column to integers, replacing 'more' with random int from range 4-10 ```{r} -data$children <-as.numeric(as.character(data$children)) -data[is.na(data)] <- round(runif(sum(is.na(data)), min = 4, max = 10),0) +data$children <- as.numeric(as.character(data$children)) +data[is.na(data)] <- round(runif(sum(is.na(data)), min = 4, max = 10), 0) unique(data$children) ``` @@ -48,35 +48,35 @@ unique(data$children) #### ACRO Crosstab ```{r} -index = data[, c("recommend")] -columns = data[, c("parents")] -values = data[, c("children")] +index <- data[, c("recommend")] +columns <- data[, c("parents")] +values <- data[, c("children")] # convert the values to an array -values = matrix(values, ncol=1) +values <- matrix(values, ncol = 1) -table = acro_crosstab(index = index, columns= columns, values = values, aggfunc = "sum") +table <- acro_crosstab(index = index, columns = columns, values = values, aggfunc = "sum") table ``` #### ACRO table ```{r} -index = data[, c("parents")] -columns = data[, c("social")] +index <- data[, c("parents")] +columns <- data[, c("social")] -table = acro_table(index=index, columns=columns, deparse.level=1) +table <- acro_table(index = index, columns = columns, deparse.level = 1) table ``` #### ACRO pivot table ```{r} -index = "parents" -values = "children" -aggfunc = list("mean", "std") +index <- "parents" +values <- "children" +aggfunc <- list("mean", "std") -table = acro_pivot_table(data, values=values, index=index, aggfunc=aggfunc) +table <- acro_pivot_table(data, values = values, index = index, aggfunc = aggfunc) table ``` #### ACRO histogram @@ -91,9 +91,9 @@ In this example a different data set will be used. The lung dataset from the sur ```{r} # Load the lung dataset data(lung) -#head(lung) +# head(lung) -acro_surv_func(time=lung$time, status=lung$status, output ="plot") +acro_surv_func(time = lung$time, status = lung$status, output = "plot") ``` ### Examples of producing regression outputs using acro @@ -104,27 +104,27 @@ acro_surv_func(time=lung$time, status=lung$status, output ="plot") ```{r} data$recommend <- as.character(data$recommend) -data$recommend[which(data$recommend=="not_recom")] <- "0" -data$recommend[which(data$recommend=="recommend")] <- "1" -data$recommend[which(data$recommend=="very_recom")] <- "2" -data$recommend[which(data$recommend=="priority")] <- "3" -data$recommend[which(data$recommend=="spec_prior")] <- "4" +data$recommend[which(data$recommend == "not_recom")] <- "0" +data$recommend[which(data$recommend == "recommend")] <- "1" +data$recommend[which(data$recommend == "very_recom")] <- "2" +data$recommend[which(data$recommend == "priority")] <- "3" +data$recommend[which(data$recommend == "spec_prior")] <- "4" data$recommend <- as.numeric(data$recommend) ``` ```{r} # extract relevant columns -df = data[, c("recommend", "children")] +df <- data[, c("recommend", "children")] # drop rows with missing values -df = df[complete.cases(df), ] +df <- df[complete.cases(df), ] # formula to fit -formula = "recommend ~ children" +formula <- "recommend ~ children" ``` #### ACRO Linear Model ```{r} -acro_lm(formula=formula, data=df) +acro_lm(formula = formula, data = df) ``` #### ACRO Logit Model @@ -133,25 +133,25 @@ We use a different combination of variables from the original dataset. ```{r} # extract relevant columns -df = data[, c("finance", "children")] +df <- data[, c("finance", "children")] # drop rows with missing values -df = df[complete.cases(df), ] +df <- df[complete.cases(df), ] # convert finance to numeric -df = transform(df, finance = as.numeric(finance)) +df <- transform(df, finance = as.numeric(finance)) # subtract 1 to make 1s and 2S into 0a and 1s -df$finance <- df$finance -1 +df$finance <- df$finance - 1 # formula to fit -formula = "finance ~ children" +formula <- "finance ~ children" ``` ```{r} -acro_glm(formula=formula, data=df, family="logit") +acro_glm(formula = formula, data = df, family = "logit") ``` #### ACRO Probit Model ```{r} -acro_glm(formula=formula, data=df, family="probit") +acro_glm(formula = formula, data = df, family = "probit") ``` ### Examples of functionality to let users manage their output @@ -185,12 +185,12 @@ acro_add_comments("output_1", "This is a crosstab on the nursery dataset.") ``` #### Finalise -- The users must call finalise() at the end of each session. -- Each output is saved to a CSV file. -- The SDC analysis for each output is saved to a json file or Excel file +- The users must call finalise() at the end of each session. +- Each output is saved to a CSV file. +- The SDC analysis for each output is saved to a json file or Excel file (depending on the extension of the name of the file provided as an input to the function) ```{r} -#acro_finalise("RTEST", "xlsx") +# acro_finalise("RTEST", "xlsx") acro_finalise("RTEST", "json") ```