Skip to content

Commit 2b5920e

Browse files
authored
Merge pull request #6 from SumedhSankhe/update-dockerfile
update post
2 parents 4b9e74d + 7599ba7 commit 2b5920e

2 files changed

Lines changed: 103 additions & 56 deletions

File tree

README.md

Lines changed: 67 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
1-
# Optimizing R Shiny Docker Builds: A Multistage Approach
1+
# Optimizing R Shiny Docker Builds: From 40 Minutes to 10 Minutes
22

33
[![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=flat&logo=docker&logoColor=white)](https://www.docker.com/)
44
[![R](https://img.shields.io/badge/r-%23276DC3.svg?style=flat&logo=r&logoColor=white)](https://www.r-project.org/)
55
[![Shiny](https://img.shields.io/badge/Shiny-shinyapps.io-blue?style=flat&logo=RStudio&logoColor=white)](https://shiny.rstudio.com/)
66

7-
A practical demonstration of Docker optimization techniques for R Shiny applications, showing how multistage builds can reduce image size by 40-50% and improve build times through better layer caching.
7+
A practical demonstration of Docker optimization techniques for R Shiny applications, showing how multistage builds with rocker/r2u can reduce image size by 25% and improve build times by 80-94% through better layer caching and binary package installation.
8+
9+
> **Blog Post:** [Read the full story](./blog-post.md) about optimizing Docker builds for a customer-facing R Shiny SaaS application running on Kubernetes.
810
911
## The Problem
1012

@@ -59,21 +61,31 @@ docker run -p 3838:3838 shiny-app:optimized
5961

6062
## Performance Comparison
6163

62-
| Metric | Single-Stage | Multistage | Improvement |
63-
|--------|-------------|------------|-------------|
64-
| **Image Size** | ~2.1 GB | ~1.2 GB | **43% smaller** |
65-
| **Build Time (cold)** | ~15 min | ~12 min | **20% faster** |
66-
| **Build Time (cached)** | ~8 min | ~30 sec | **94% faster** |
67-
| **Layers** | 12 | 8 (runtime) | **Cleaner** |
64+
Results from GitHub Container Registry (verified via GitHub Actions):
65+
66+
| Metric | Single-Stage | Two-Stage | Three-Stage | Improvement |
67+
|--------|-------------|-----------|-------------|-------------|
68+
| **Image Size (GHCR)** | 1.27 GB | 948 MB | 948 MB | **25% smaller** |
69+
| **Build Time (warm)** | 5-7 mins | ~30 sec | ~30 sec | **92-94% faster** |
70+
| **Build Time (cold)** | 8-10 mins | 6-8 mins | 6-8 mins | **20-25% faster** |
6871

69-
*Results may vary based on dependencies and hardware*
72+
**Key Features:**
73+
- Uses **rocker/r2u** for binary R package installation (faster than source compilation)
74+
- **Layer caching** separates dependencies from application code
75+
- **Multistage builds** exclude build tools from runtime image
76+
- **Production-ready** pattern used in customer-facing SaaS applications
77+
78+
*See [blog-post.md](./blog-post.md) for production results with 200+ packages (1.5GB → 875MB, 42% reduction)*
7079

7180
## Architecture Deep Dive
7281

7382
### Single-Stage Build (Before)
7483

7584
```dockerfile
76-
FROM rocker/r-ver:4.3.2
85+
FROM rocker/r2u:24.04
86+
87+
# Configure renv cache
88+
ENV RENV_PATHS_CACHE="/app/renv/.cache"
7789

7890
# Install system dependencies
7991
RUN apt-get update && apt-get install -y \
@@ -97,34 +109,50 @@ RUN R -e "renv::restore()"
97109

98110
```dockerfile
99111
# ============ STAGE 1: Builder ============
100-
FROM rocker/r-ver:4.3.2 AS builder
112+
FROM rocker/r2u:24.04 AS builder
113+
114+
# Configure renv cache to use consistent path
115+
ENV RENV_PATHS_CACHE="/app/renv/.cache"
101116

102117
# Install build dependencies
103118
RUN apt-get update && apt-get install -y ...
104119

120+
WORKDIR /app
121+
105122
# Copy ONLY dependency files first (cached layer)
106123
COPY renv.lock renv.lock
107124
COPY .Rprofile .Rprofile
108125
COPY renv/activate.R renv/activate.R
126+
COPY renv/settings.json renv/settings.json
109127

110128
# Install packages (cached unless renv.lock changes)
129+
RUN R -e "install.packages('renv', repos='https://cloud.r-project.org')"
111130
RUN R -e "renv::restore()"
112131

113132
# Copy code AFTER dependencies
114133
COPY app.R app.R
115134

116135
# ============ STAGE 2: Runtime ============
117-
FROM rocker/r-ver:4.3.2
136+
FROM rocker/r2u:24.04
137+
138+
# Configure renv cache to match builder
139+
ENV RENV_PATHS_CACHE="/app/renv/.cache"
118140

119141
# Install ONLY runtime dependencies
120142
RUN apt-get update && apt-get install -y \
121143
libcurl4 \ # Note: no -dev packages
122144
libssl3 \
123145
...
124146

125-
# Copy from builder
126-
COPY --from=builder /build/renv /app/renv
127-
COPY --from=builder /build/app.R /app/app.R
147+
WORKDIR /app
148+
149+
# Copy from builder (includes renv cache)
150+
COPY --from=builder /app/renv /app/renv
151+
COPY --from=builder /app/.Rprofile /app/.Rprofile
152+
COPY --from=builder /app/renv.lock /app/renv.lock
153+
COPY --from=builder /app/app.R /app/app.R
154+
155+
CMD ["R", "--vanilla", "-e", ".libPaths('/app/renv/library/linux-ubuntu-noble/R-4.5/x86_64-pc-linux-gnu'); shiny::runApp('/app', host='0.0.0.0', port=3838)"]
128156
```
129157

130158
**Improvements:**
@@ -205,15 +233,20 @@ shiny-docker-optimization/
205233
RUN apt-get install -y libpq5 # Runtime
206234
```
207235

208-
3. **Adjust base image** if needed:
236+
3. **Choose the right base image:**
209237
```dockerfile
210-
# Use specific R version
211-
FROM rocker/r-ver:4.3.1 AS builder
212-
213-
# Or use rocker/shiny for built-in Shiny Server
214-
FROM rocker/shiny:4.3.2 AS builder
238+
# Recommended: rocker/r2u for binary packages (faster builds)
239+
FROM rocker/r2u:24.04 AS builder
240+
241+
# Alternative: rocker/r-ver for source compilation
242+
FROM rocker/r-ver:4.5.2 AS builder
243+
244+
# For Shiny Server (if not using Kubernetes)
245+
FROM rocker/shiny:4.5.2 AS builder
215246
```
216247

248+
**Note:** rocker/r2u provides pre-compiled binary packages, dramatically reducing build times compared to source compilation. Highly recommended for production use.
249+
217250
### Production Considerations
218251

219252
- **Health checks**: Add Docker health checks for production
@@ -223,7 +256,9 @@ shiny-docker-optimization/
223256

224257
## Related Resources
225258

259+
- **[Blog Post](./blog-post.md)**: Full story of optimizing Docker builds for production R Shiny SaaS
226260
- [Docker Multistage Builds Documentation](https://docs.docker.com/build/building/multi-stage/)
261+
- [r2u: CRAN as Ubuntu Binaries](https://eddelbuettel.github.io/r2u/) - Binary R packages for Ubuntu
227262
- [renv: R Dependency Management](https://rstudio.github.io/renv/)
228263
- [Rocker Project: Docker Images for R](https://rocker-project.org/)
229264
- [Production-Grade Shiny Apps](https://engineering-shiny.org/)
@@ -249,6 +284,16 @@ MIT License - see LICENSE file for details
249284

250285
---
251286

252-
If you found this helpful, consider starring the repository!
287+
**If you found this helpful, consider starring the repository!**
253288

254289
Built with practical experience from deploying production R Shiny applications on Azure Kubernetes Service.
290+
291+
## Real-World Impact
292+
293+
This demo repository showcases the optimization pattern. In production at Alamar Biosciences:
294+
- **NULISA Analysis Software (NAS)**: Customer-facing SaaS with 200+ R packages
295+
- **Image size**: Reduced from 1.5GB to 875MB (42% smaller)
296+
- **Build times**: 40 minutes → 8-15 minutes for code changes (80% faster)
297+
- **Deployment**: Running on Azure Kubernetes Service, serving customers globally
298+
299+
Read the [full blog post](./blog-post.md) for details on the production setup including base image management, cache-busting strategies, and CI/CD integration.

blog-post.md

Lines changed: 36 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
2-
title: "Optimizing R Shiny Docker Builds: From 40 Minutes to 10 Minutes"
2+
title: "Optimizing R Shiny Docker Builds: Warm vs Cold Build Strategy"
33
date: 2024-12-16
44
author: Sumedh R. Sankhe
55
tags: [Docker, R, Shiny, DevOps, Performance, Kubernetes, SaaS]
6-
description: "How we reduced Docker build times by 80% and image sizes by 42% for a customer-facing R Shiny SaaS application using multistage builds with rocker/r2u"
6+
description: "How we optimized Docker builds for a customer-facing R Shiny SaaS application by separating code changes (8-15 min) from dependency changes (40 min) and reduced image sizes by 42%"
77
---
88

9-
# Optimizing R Shiny Docker Builds: From 40 Minutes to 10 Minutes
9+
# Optimizing R Shiny Docker Builds: Warm vs Cold Build Strategy
1010

11-
This is my first time writing up a technical blog post, so bear with me. I'm going to share what I learned optimizing Docker builds for R Shiny apps, including all the mistakes I made along the way.
11+
This is my first time writing up a technical blog post, so bear with me. I'm going to share what we learned optimizing Docker builds for R Shiny apps, including the things that broke along the way.
1212

13-
At Alamar Biosciences, I work on the NULISA Analysis Software (NAS) - basically a Shiny app for analyzing proteomics data. Unlike typical internal Shiny apps deployed with Posit Connect, NAS is a **customer-facing SaaS application** running on Azure Kubernetes Service (AKS). Our customers access it directly for analyzing their proteomics experiments, which means deployment speed, reliability, and scalability matter differently than for internal tools.
13+
At Alamar Biosciences, I work on the NULISA Analysis Software (NAS) - a **large-scale, customer-facing SaaS application** for analyzing proteomics data, built with R Shiny and running on Azure Kubernetes Service (AKS). NAS is used by customers across academia and industry worldwide as a free cloud service. Unlike typical internal Shiny apps deployed with Posit Connect, NAS serves external customers directly, which means deployment speed, reliability, and scalability are critical for customer satisfaction and platform availability.
1414

15-
When I started, our Docker builds were painfully slow. Like, grab coffee, queue some villagers in AoE2, maybe respond to some emails slow. A simple one-line code change? Wait 40 minutes for Docker to rebuild the image. Our images were pushing 1.5GB compressed. Every deployment to Kubernetes felt like it took forever. When you're shipping features to customers and fixing production bugs, 40-minute build times kill your velocity.
15+
Our Docker builds were taking 20-25 minutes. Every. Single. Build. It didn't matter if you changed one line of code or overhauled the entire data processing pipeline—Docker would reinstall all 200+ R packages from scratch. A simple bug fix? Wait 25 minutes. Testing a UI tweak? Another 25 minutes. Our images were pushing 1.5GB compressed to Azure Container Registry. When you're shipping features to customers and fixing production bugs, treating every build the same kills your velocity.
1616

17-
This post walks through how I cut Docker build times by 80% for code changes (40 mins → 8-15 mins) and reduced image sizes by 42% (1.5GB → 875MB). The optimization involves separating build from runtime using Docker multistage builds and leveraging rocker/r2u for binary package installation. More importantly, this post covers the things that broke along the way and how I fixed them.
17+
This post walks through the optimizations we implemented in late 2025 that fundamentally changed how we build Docker images. The key insight: **separate code changes from dependency changes**. Now, the common case (code changes) builds in 8-15 minutes—a 60-68% improvement—while dependency updates take longer (40 mins) but happen infrequently. We also reduced image sizes by 42% (1.5GB → 870MB compressed in ACR). The optimization involves splitting stable CRAN dependencies into a base image, leveraging rocker/r2u for binary package installation, and properly structuring multistage builds. More importantly, this post covers the things that broke along the way and how we fixed them.
1818

1919
## The Problem: Slow, Bloated Docker Images
2020

@@ -58,13 +58,13 @@ Our final images had everything needed to build the app, not just run it. All th
5858
Here's what our Azure Kubernetes deployment looked like:
5959
1. Push a code change (bug fix, new feature, etc.)
6060
2. GitHub Actions starts building
61-
3. Wait 40 minutes (go get coffee, check Slack, lose focus)
61+
3. Wait 20-25 minutes (go get coffee, check Slack, lose focus)
6262
4. Push to Azure Container Registry
6363
5. Deploy to AKS
6464

6565
Those build times killed productivity. You'd push a fix, then switch to something else while waiting. By the time the build finished, you'd forgotten what you were even working on. It was like waiting for a Wonder to be built while your opponents are rushing you with trebuchets.
6666

67-
**Why this matters for SaaS:** With Posit Connect, you typically deploy once and iterate internally. With a customer-facing SaaS on Kubernetes, you're constantly shipping features, bug fixes, and updates. Fast build times directly impact how quickly you can respond to customer issues and ship improvements. A 40-minute build cycle means you can only deploy 2-3 times per day max. That's not acceptable for modern SaaS development.
67+
**Why this matters for SaaS:** With Posit Connect, you typically deploy once and iterate internally. With a customer-facing SaaS on Kubernetes, you're constantly shipping features, bug fixes, and updates. Fast build times directly impact how quickly you can respond to customer issues and ship improvements. A 25-minute build cycle for every code change means you can only deploy a handful of times per day. That's not acceptable for modern SaaS development.
6868

6969
## The Solution: Multistage Docker Builds
7070

@@ -198,24 +198,32 @@ Here are the real numbers from our GitHub Actions workflow on the demo repo:
198198

199199
| Metric | Single-Stage | Two-Stage | Three-Stage | Improvement |
200200
|--------|-------------|-----------|-------------|-------------|
201-
| **Image size (uncompressed)** | 1.89 GB | 1.44 GB | 1.44 GB | **24% smaller** |
202-
| **Image size (compressed/registry)** | 512 MB | 403 MB | 403 MB | **21% smaller** |
203-
| **Warm build** (code change only) | 5-7 mins | 30-45s | 30-45s | **85-92% faster** |
201+
| **Image size (GHCR)** | 1.27 GB | 948 MB | 948 MB | **25% smaller** |
202+
| **Warm build** (code change only) | 5-7 mins | ~30s | ~30s | **92-94% faster** |
204203
| **Cold build** (no cache) | 8-10 mins | 6-8 mins | 6-8 mins | 20-25% faster |
205204

206205
For our production NAS application with 200+ CRAN packages and 2 custom in-house packages:
207206

208-
| Metric | Before (Single-Stage) | After (Three-Stage) | Improvement |
209-
|--------|----------------------|---------------------|-------------|
210-
| **Image size (compressed)** | 1.5 GB | 875 MB | **42% smaller** |
211-
| **Cold build** (all layers) | 40 mins | 30-35 mins | **20-25% faster** |
212-
| **Warm build** (code change only) | 40 mins | 8-15 mins | **70-80% faster** |
207+
| Metric | Before (NAS 1.3) | After (NAS 1.4) | Improvement |
208+
|--------|------------------|-----------------|-------------|
209+
| **Image size (ACR compressed)** | 1.5 GB | 870 MB | **42% smaller** |
210+
| **Warm build** (code change only) | 20-25 mins | 8-15 mins | **60-68% faster** |
211+
| **Cold build** (dependency change) | 20-25 mins | 40 mins | Slower, but infrequent |
213212

214213
**Note:** Our production setup includes additional optimizations beyond the scope of this post: automated base image rebuilds triggered by renv.lock hash changes, cache-busting strategies for custom package updates, and GitHub Actions runner cleanup for multi-stage builds. These advanced CI/CD integrations will be covered in a follow-up post.
215214

216-
**Note on image sizes:** Container registries (like Azure Container Registry, Docker Hub, GitHub Container Registry) store images in compressed format, which is why the "compressed/registry" sizes are significantly smaller than what you see locally with `docker images`. When you push an image to a registry, Docker compresses the layers, typically achieving 25-35% of the uncompressed size. This is important when considering deployment times and registry storage costs.
215+
**Note on image sizes:** The sizes shown are **compressed sizes** as stored in container registries (ACR/GHCR). These are the sizes that matter for:
216+
- Registry storage costs
217+
- Network transfer time during push/pull
218+
- Initial deployment speed to Kubernetes
217219

218-
**Note on warm builds:** Even single-stage Dockerfiles can have warm builds if you structure the layers correctly. The problem is that most single-stage setups use `COPY . .` early, which invalidates package installation on every code change. Our original single-stage Dockerfile had this issue, which is why warm builds were still slow.
220+
Container registries compress images to about 25-35% of their uncompressed size. So the 870MB compressed NAS image is ~2.2-2.6GB when uncompressed on disk. The demo app achieves a 25% reduction in registry size, while our production NAS app sees a 42% reduction (1.5GB → 870MB in ACR).
221+
222+
**Note on warm vs cold builds:** The key optimization is **distinguishing between these two scenarios**. Our original setup treated every build the same—changing one line of code triggered a full 20-25 minute rebuild with all packages reinstalled. The optimized approach separates:
223+
- **Warm builds** (90% of builds): Code changes only → 8-15 mins
224+
- **Cold builds** (10% of builds): Dependency changes → 40 mins (longer, but comprehensive and cached)
225+
226+
Yes, cold builds are now slower, but they happen rarely (when you add/update packages). The common case (shipping code) is 60-68% faster.
219227

220228
The full CI/CD pipeline includes additional steps beyond the Docker build: running unit tests, extracting test results, publishing them to GitHub, security scanning, etc. That's why the end-to-end time is longer than just the Docker build. The unit testing integration will be covered in a separate blog post.
221229

@@ -235,12 +243,16 @@ RUN R -e "renv::restore()" # So this has to run again. Every time.
235243

236244
**The right way:**
237245
```dockerfile
238-
COPY renv.lock . # Only changes when you add/remove packages
239-
RUN R -e "renv::restore()" # Gets cached and reused for code changes
240-
COPY app.R . # Changes all the time, but doesn't break cache above
246+
COPY renv.lock . # Only changes when you add/remove packages (COLD build)
247+
RUN R -e "renv::restore()" # Gets cached and reused for code changes (enables WARM builds)
248+
COPY app.R . # Changes all the time, but doesn't break cache above (WARM build)
241249
```
242250
243-
That simple reordering is the entire reason warm builds went from 8-10 minutes to 30 seconds. Put your stable stuff first, your frequently changing stuff last.
251+
That simple reordering creates the warm/cold build distinction:
252+
- **Warm build**: `app.R` changes, `renv.lock` unchanged → `renv::restore()` layer is cached, build takes 30s
253+
- **Cold build**: `renv.lock` changes → `renv::restore()` runs, takes 6-8 mins for the demo app (40 mins for NAS with 200+ packages)
254+
255+
Put your stable stuff first, your frequently changing stuff last.
244256
245257
## How This Works in Production
246258
@@ -295,17 +307,7 @@ RUN --mount=type=secret,id=github_pat \
295307

296308
This keeps secrets out of your layers.
297309

298-
### 2. Parallel Package Installation
299-
300-
This is a small win but it adds up:
301-
302-
```dockerfile
303-
RUN R -e "options(Ncpus = 4); renv::restore()"
304-
```
305-
306-
Uses 4 cores to compile packages instead of 1. On a GitHub Actions runner, this shaved off another minute or two.
307-
308-
### 3. Pick the Right Base Image
310+
### 2. Pick the Right Base Image
309311

310312
Use `rocker/r2u` for significantly faster package installation through binary packages. This is especially beneficial for large projects with many dependencies.
311313

0 commit comments

Comments
 (0)