You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
py-rocket has separate R and Python installations because there are a variety of system packages linkages (GDAL and others depending whyat you are doing) that will break if you do not use the right system linkages. The way this is handled is via the the system PATH. This tells functions where to look for files it needs.
4
4
5
-
As long as you only use R or Python (don't mix the two) in a notebook, you will be fine in py-rocket. When you activate R (in JupyterLab, RStudio, R, or VSCode), the path will not have conda. When you activate Python (via reticulate), it will use the conda "notebook" environment and have that on the path.
5
+
When you activate R (in JupyterLab, RStudio, R, or VSCode), the path will not have conda. If you use the reticulate package to run Python in R,
6
+
it is recommended to use `py_require()` and install the needed packages into the ephemeral environment that reticulate creates. If you try to
7
+
use `reticulate::use_conda()` in RStudio, functions that try to get files from the cloud will not work since RStudio sets up `LD_LIBRARY_PATH` and there is no real way to reset that. You can use `reticulate::use_conda()` in R started from the termial or JupyterLab, but be aware that
8
+
it will put conda on the `PATH` and subsequent R functions that use GDAL will break or issue warnings.
6
9
7
10
Try this in R (RStudio or the R kernel in Jupyter Lab):
8
11
```
9
12
Sys.getenv("PATH")
10
13
```
11
-
Try this in a Jupyter Notebook in Jupyter Lab:
14
+
Try this in a Jupyter Notebook in Jupyter Lab with the Python kernel:
or use the helper script plus a `install.R` file in your Docker file:
32
35
```
33
-
COPY . /tmp2/
36
+
COPY install.R /tmp/
34
37
RUN /pyrocket_scripts/install-r-packages.sh /tmp2/install.R
35
38
```
36
39
@@ -44,7 +47,7 @@ The following behavior is specific to R, not the GUI (RStudio or Jupyter Lab wit
44
47
45
48
### `py_require()`
46
49
47
-
To use Python, you use the `reticulate` library. If you only need a handful of Python packages, it will simplify things if you use `py_require()`. Like this
50
+
To use Python, you use the `reticulate` library. If you only need a handful of Python packages, it will simplify things if you use `py_require()`. This will link to the GDAL system libraries that R uses. Like this
48
51
```
49
52
library(reticulate)
50
53
py_require("xarray")
@@ -53,13 +56,13 @@ This will create an ephemeral environment with the packages you require and does
53
56
54
57
One gotcha is that reticulate creates a cache in `~/.cache/R/reticulate` and it might not be easy to change later to using a conda environment for your Python binary. I often had to do
55
58
```
56
-
rm ~/.cache/R/reticulate
59
+
rm -rf ~/.cache/R/reticulate
57
60
```
58
61
in a terminal to get reticulate to allow me to use `use_conda("notebook")` in another R session.
59
62
60
63
### Using a conda environment
61
64
62
-
You can also use the conda environment with reticulate with all the pre-installed packages.
65
+
You can also use the conda environment with reticulate with all the pre-installed packages**outside of RStudio**.
63
66
```
64
67
library(reticulate)
65
68
use_condaenv("notebook")
@@ -80,23 +83,109 @@ Note, the terminal in RStudio is not the same environment as R. So doing `echo $
80
83
81
84
**Why activating conda causes problems for R**
82
85
83
-
When we use a conda environment, the PATH is altered so that the conda environment directory appears first on the PATH. Any R packages that need a particular system package that is also in conda (like GDAL) are likely to throw mis-match errors.
86
+
When we use a conda environment, the PATH is altered so that the conda environment directory appears first on the PATH. Any R packages that need a particular system package that is also in conda (like GDAL) are likely to throw mis-match errors. Also GDAL needs linking for various dynamic libraries.
84
87
85
88
## Dealing with SSL mismatch errors
86
89
87
-
When you use reticulate in R, use `use_condaenv()` and call a function that needs to download data, you are liable to get a OpenSSL mismatch error. py-rocket solves this by adding this to
90
+
When you use reticulate in R, use `use_condaenv()` and call a function that needs to download data, you are liable to get a OpenSSL mismatch error. You can add the below to `/etc/rstudio/rserver.conf`:
to `/etc/rstudio/rserver.conf`. This let's R know where to look for SSL links and hopefully doesn't break R packages. Make sure that `.Renviron` does not set `LD_LIBRARY_PATH` or this solution will not work. I don't know why but it breaks.
94
+
This let's R know where to look for SSL links but it will break any R packages that need GDAL. Make sure that `.Renviron` does not set `LD_LIBRARY_PATH` or this solution will not work. I don't know why but it breaks.
92
95
93
-
## Developers
96
+
## Examples of workflows that combine use Python in an R environment
97
+
98
+
### Staying in the R GDAL system libraries will work.
Use reticulate with `py_require()` to run Python commands. This works because it uses the same GDAL libraries as R does.
94
107
95
-
How is the R kernel created so that it shows up in Jupyter Lab? You don't need to install R into the conda environment since it already is in the image. We just need to use `IRkernel` R package to register the kernel with jupyter. This is in `scripts/install_rocker.sh`.
### Mixing conda GDAL and R GDAL system libraries does not work
117
+
118
+
The problems happen when you try to use a conda env. Now it uses/wants the GDAL libraries in the conda env. *RStudio sets up the LD_LIBRARY_PATH as soon as it starts up*. This does not happen if you start R via JupyterLab kernel or via `R` in the terminal.
119
+
120
+
This will work in plain R session (start R from terminal or JupyterLab) but not one started in RStudio.
# may or may not work; but def the dyn libs are wrong
155
+
library(terra)
156
+
r <- rast(url, vsi = TRUE)
157
+
```
158
+
159
+
How to keep conda GDAL and R GDAL completely separate in a script, assuming you do not want to use `py_require()` which is the better way to go. You can use `callr::r()` to start a clean R session within a running R session. But `callr` only returns R objects so you will need to do something with `ds`, like save to netcdf
out <- file.path(Sys.getenv("HOME"), "chla_subset.nc")
163
+
164
+
callr::r(function(url, out) {
165
+
library(reticulate)
166
+
use_condaenv("notebook", required = TRUE)
167
+
168
+
xr <- import("xarray", convert = FALSE)
169
+
py <- import("builtins", convert = FALSE)
170
+
171
+
ds <- xr$open_dataset(url, engine = "h5netcdf")
172
+
da <- ds[["CHLA"]]$
173
+
isel(time = 0L, z = 0L)$
174
+
sel(lat = py$slice(10, 5),
175
+
lon = py$slice(-140, -135))
176
+
177
+
da$to_netcdf(out)
178
+
out
179
+
}, args = list(url = url, out = out))
180
+
```
181
+
182
+
## Developers: R kernel in JupyterLab
183
+
184
+
How is the R kernel created so that it shows up in Jupyter Lab? You don't need to install R into the conda environment since it already is in the image. We just need to use `IRkernel` R package to register the kernel with jupyter. This is in `scripts/install_rocker.sh` and is built into py-rocket-base.
96
185
```
97
186
Rscript - <<-"EOF"
98
187
install.packages('IRkernel', lib = .Library) # install in system library
0 commit comments