|
| 1 | +# Data Engineer Server |
1 | 2 |
|
2 | | -# Data Science Server |
| 3 | + |
| 4 | + |
| 5 | + |
| 6 | + |
3 | 7 |
|
4 | | -_The list of shell scripts for configuration Data Science server on Ubuntu Server 24.04_. |
| 8 | +- [Repository structure](#repository-structure) |
| 9 | +- [Features](#features) |
| 10 | +- [Prerequisites](#prerequisites) |
| 11 | +- [Quick start](#quick-start) |
| 12 | + - [Servers](#servers) |
| 13 | + - [Clients](#clients) |
| 14 | +- [Configuration](#configuration) |
| 15 | + - [Environment variables](#environment-variables) |
| 16 | +- [Ports](#ports) |
| 17 | +- [Troubleshooting](#troubleshooting) |
| 18 | +- [Development helpers (Ubuntu 24.04)](#development-helpers-ubuntu-2404) |
| 19 | +- [Notes on reverse proxy (Traefik)](#notes-on-reverse-proxy-traefik) |
| 20 | +- [Contributing](#contributing) |
5 | 21 |
|
6 | | -## Software / Frameworks |
7 | 22 |
|
8 | | -Installing software/frameworks: |
| 23 | +Local-first data engineering and AI/DS workspace with dockerized services, helper scripts, and Python clients. |
9 | 24 |
|
10 | | -- [x] ML/DL frameworks: |
11 | | - - [x] Tensorflow (with GPU support) |
12 | | - - [x] Keras (with GPU support) |
13 | | - - [x] LightGBM (with GPU support) |
14 | | - - [x] H2O Open |
15 | | -- [x] R CRAN |
16 | | - - [x] with pre-installed basic R-packages |
17 | | -- [x] RStudio Server |
18 | | - - [x] with Azure Database Service connector |
19 | | -- [x] JupyterLab |
20 | | -- [x] .NET Core SDK |
21 | | -- [x] Docker |
22 | | -- [x] Git configure |
| 25 | +- AI agents (Ollama + Open WebUI) |
| 26 | +- Object storage (MinIO) with event bus (RabbitMQ) |
| 27 | +- Centralized structured logging (Seq) |
| 28 | +- RStudio Server for R analytics |
| 29 | +- Ubuntu provisioning notes and developer setup scripts |
23 | 30 |
|
24 | | -## Preparation |
25 | 31 |
|
26 | | -```sh |
27 | | -git clone https://github.com/codez0mb1e/cloud-deep-learning-server.git |
| 32 | +## Repository structure |
28 | 33 |
|
29 | | -cd cloud-deep-learning-server/src |
30 | | -mkdir logs |
31 | 34 | ``` |
| 35 | +|--. |
| 36 | + | -- ai-agents/ # Local AI stack (Ollama + Open WebUI) and model bootstrap |
| 37 | + | -- development/ # One-off scripts to set up dev tools on Ubuntu |
| 38 | + | -- minio/ # MinIO object storage: server compose + Python client |
| 39 | + | -- pipelines/ # AutoML pipeline docs and diagrams (conceptual) |
| 40 | + | -- rstudio-server/ # RStudio Server via Docker Compose (+ R frameworks notes) |
| 41 | + | -- seq/ # Seq logging: server compose + Python client |
| 42 | + | -- ubuntu-os/ # Ubuntu 24.04 tips, packages, disks/network, users |
| 43 | +``` |
| 44 | + |
| 45 | + |
| 46 | +## Features |
| 47 | + |
| 48 | +- Local AI stack with Ollama and Open WebUI for chat and coding assistance |
| 49 | +- RStudio Server for R analytics and data science |
| 50 | +- MinIO S3-compatible object storage with optional RabbitMQ notifications |
| 51 | +- Seq centralized logging with structured logs |
| 52 | +- Minimal Python clients for MinIO and Seq |
| 53 | +- Clear, scriptable startup via Docker Compose and small helper scripts |
| 54 | + |
| 55 | + |
| 56 | +## Prerequisites |
| 57 | + |
| 58 | +- Linux (tested on Ubuntu Server 24.04) |
| 59 | +- Docker Engine and Docker Compose plugin |
| 60 | +- Optional: Python 3.11+ for the MinIO/Seq client examples |
| 61 | + |
| 62 | + |
| 63 | +## Quick start |
| 64 | + |
| 65 | +### Servers |
| 66 | + |
| 67 | +#### 1. AI agents: Ollama + Open WebUI |
| 68 | + |
| 69 | +Folder: `ai-agents/` — see more in that folder's README. |
| 70 | + |
| 71 | +```bash |
| 72 | +cd ai-agents |
| 73 | + |
| 74 | +# Start services (Ollama + Open WebUI; model-downloader will fetch baseline models) |
| 75 | +docker compose up -d |
| 76 | + |
| 77 | +# Watch model downloads |
| 78 | +docker compose logs model-downloader -f |
| 79 | + |
| 80 | +# Open the chat UI |
| 81 | +# http://localhost:3000 |
| 82 | +``` |
| 83 | + |
| 84 | +Services: |
| 85 | +- Ollama: http://localhost:11434 |
| 86 | +- Open WebUI: http://localhost:3000 |
| 87 | + |
| 88 | + |
| 89 | +#### 2. RStudio Server |
| 90 | + |
| 91 | +Folder: `rstudio-server/server/` |
| 92 | + |
| 93 | +```bash |
| 94 | +cd rstudio-server/server |
| 95 | + |
| 96 | +# Set a password for the rstudio user |
| 97 | +echo "RSTUDIO_PASSWORD=<your-password>" > .env |
| 98 | + |
| 99 | +# Start RStudio Server |
| 100 | +docker compose up -d |
| 101 | + |
| 102 | +# Open the IDE |
| 103 | +# http://localhost:8787 (username: rstudio, password: from .env) |
| 104 | +``` |
| 105 | + |
| 106 | +Notes: |
| 107 | +- Port is bound to 127.0.0.1:8787 by default. |
| 108 | +- The compose file mounts `/home/${USER}/` into the container at `/home/rstudio/`. |
| 109 | +- Extra tips and R package guidance live in `rstudio-server/ds-frameworks/README.md`. |
| 110 | + |
| 111 | + |
| 112 | +#### 3. MinIO object storage (+ RabbitMQ for notifications) |
| 113 | + |
| 114 | +Folder: `minio/server/` |
| 115 | + |
| 116 | +```bash |
| 117 | +cd minio/server |
| 118 | + |
| 119 | +# Required secrets |
| 120 | +cat > .env << 'EOF' |
| 121 | +MINIO_ROOT_PASSWORD=<strong-password> |
| 122 | +RABBITMQ_ROOT_PASSWORD=<strong-password> |
| 123 | +# Optional, only used if you run behind Traefik |
| 124 | +# PRIMARY_DOMAIN=example.com |
| 125 | +EOF |
| 126 | + |
| 127 | +# Create network/volumes and start services |
| 128 | +bash ./run.sh |
| 129 | + |
| 130 | +# MinIO S3 API: http://localhost:9000 |
| 131 | +# RabbitMQ UI: http://localhost:15672 (user: admin, pass: from .env) |
| 132 | +``` |
| 133 | + |
| 134 | +Notes: |
| 135 | +- The MinIO Console runs on port 9001 inside the container. It's exposed via Traefik labels if you have a proxy configured; no direct host port is published here. |
| 136 | + |
| 137 | + |
| 138 | +#### 4. Seq centralized logging |
| 139 | + |
| 140 | +Folder: `seq/server/` |
| 141 | + |
| 142 | +```bash |
| 143 | +cd seq/server |
| 144 | + |
| 145 | +# Start (creates network/volume and launches the container) |
| 146 | +bash ./run.sh |
| 147 | + |
| 148 | +# Access |
| 149 | +# This compose is set up for reverse proxy via Traefik (labels only). |
| 150 | +# Publish ports or configure Traefik+PRIMARY_DOMAIN to access the UI. |
| 151 | +``` |
| 152 | + |
| 153 | + |
| 154 | +### Clients |
| 155 | + |
| 156 | +#### MinIO client |
| 157 | + |
| 158 | +Folder: `minio/client/` |
| 159 | + |
| 160 | +```bash |
| 161 | +cd minio/client |
| 162 | +python -m venv .venv && source .venv/bin/activate |
| 163 | +pip install -r requirements.txt |
| 164 | +``` |
| 165 | + |
| 166 | +See `minio/client/clients.py` for Pandas/Polars put/get helpers. |
| 167 | + |
| 168 | + |
| 169 | +#### Seq logger client |
| 170 | + |
| 171 | +Folder: `seq/client/` |
| 172 | + |
| 173 | +```bash |
| 174 | +cd seq/client |
| 175 | +python -m venv .venv && source .venv/bin/activate |
| 176 | +pip install -r requirements.txt |
| 177 | +``` |
| 178 | + |
| 179 | +Configure `seq/client/config.yml` with your Seq endpoint and API key, then wire a logger using `LoggerFactory` in `seq_logger.py`. |
| 180 | + |
| 181 | + |
| 182 | +## Configuration |
| 183 | + |
| 184 | +Most services read configuration from simple `.env` files or inline compose env. Keep secrets out of VCS. |
| 185 | + |
| 186 | +### Environment variables |
| 187 | + |
| 188 | +- AI Agents (`ai-agents/.env`) |
| 189 | + - `WEBUI_SECRET_KEY` — optional secret for Open WebUI (set if enabling auth) |
| 190 | +- RStudio (`rstudio-server/server/.env`) |
| 191 | + - `RSTUDIO_PASSWORD` — password for the `rstudio` user |
| 192 | +- MinIO/RabbitMQ (`minio/server/.env`) |
| 193 | + - `MINIO_ROOT_PASSWORD` — MinIO root password |
| 194 | + - `RABBITMQ_ROOT_PASSWORD` — RabbitMQ admin password |
| 195 | + - `PRIMARY_DOMAIN` — optional, used by Traefik labels |
| 196 | +- Seq client (`seq/client/config.yml`) |
| 197 | + - `logger_settings.seq.server_url`, `api_key` — endpoint and API key |
| 198 | + |
| 199 | + |
| 200 | +## Ports |
| 201 | + |
| 202 | +- Open WebUI: 3000 (host) |
| 203 | +- Ollama: 11434 (host) |
| 204 | +- RStudio Server: 8787 (bound to 127.0.0.1) |
| 205 | +- MinIO S3 API: 9000 (host) |
| 206 | +- RabbitMQ: 5672 (AMQP), 15672 (management UI) |
| 207 | +- Seq: not exposed by default (Traefik labels included; add ports or a proxy) |
| 208 | + |
| 209 | + |
| 210 | +## Troubleshooting |
| 211 | + |
| 212 | +- Port already in use |
| 213 | + - Check with `lsof -i :PORT` and stop conflicting process or change the mapping in compose. |
| 214 | +- Docker permission denied |
| 215 | + - Add your user to the `docker` group: `sudo usermod -aG docker $USER && newgrp docker`. |
| 216 | +- Models downloading slowly (AI Agents) |
| 217 | + - Watch `model-downloader` logs; ensure adequate bandwidth and disk space. |
| 218 | +- RStudio login issues |
| 219 | + - Ensure `.env` has `RSTUDIO_PASSWORD` and the service is reachable on 127.0.0.1:8787. |
| 220 | +- MinIO/RabbitMQ not starting |
| 221 | + - Verify `.env` secrets and that the `backend` Docker network exists (created by the run script). |
| 222 | + |
| 223 | +## Development helpers (Ubuntu 24.04) |
| 224 | + |
| 225 | +Folder: `development/` — curated scripts for setting up a workstation/server. |
| 226 | + |
| 227 | +- `install_docker.sh` — Docker Engine + Compose |
| 228 | +- `install_conda.sh` — Miniconda |
| 229 | +- `uv_and_ruff.sh` — Python packaging (uv) and linting (ruff) |
| 230 | +- `install_dotnet_tools.sh` — .NET SDK |
| 231 | +- `install_azure_tools.sh` — Azure CLI/tools |
| 232 | +- `git_configure.sh` — Git username/email and quality-of-life settings |
| 233 | +- plus others for CI/CD, system design, and optional tools |
| 234 | + |
| 235 | +General OS tips live in `ubuntu-os/README.md` (packages, disks, network, users, and more). |
| 236 | + |
32 | 237 |
|
33 | | -## Installation |
| 238 | +## Notes on reverse proxy (Traefik) |
34 | 239 |
|
35 | | -1. sh [install_core.sh](/src/install_core.sh) &>logs/install_core.log |
36 | | -2. sh [install_docker.sh](/src/install_docker.sh) &>logs/install_docker.log |
37 | | -3. sh [git_configure.sh](/src/git_configure.sh) &>logs/git_configure.log <sup>1</sup> |
38 | | -4. sh [install_dotnet_tools.sh](/src/install_dotnet_tools.sh) &>logs/install_dotnet_core.log |
39 | | -5. sh [install_ds_python.sh](/src/install_ds_python.sh) > log/install_ds_python.log |
40 | | -6. sh [install_deep_learning.sh](/src/install_deep_learning.sh) &>logs/install_deep_learning.log |
41 | | -7. sh [install_r_env.sh](/src/install_r_env.sh) &>logs/install_r.log |
42 | | -8. pt [install_r_packages.R](/src/install_r_packages.R) &>logs/install_r_packages.log <sup>1</sup> |
43 | | -9. sh [install_lightgbm.sh](/src/install_lightgbm.sh) &>logs/install_lightgbm.log <sup>1</sup> |
| 240 | +Some compose files include labels for Traefik and refer to `PRIMARY_DOMAIN` in a `.env`. If you're not running a reverse proxy, you can still use the services via the published ports shown above or optionally add explicit `ports:` mappings to the compose files. |
44 | 241 |
|
45 | | -<sup>1</sup> Install under RStudio user |
46 | 242 |
|
47 | | -### Tests |
| 243 | +## Contributing |
48 | 244 |
|
49 | | -1. [Keras installation tests](/tests/keras_install_tests.R) |
50 | | -1. [LightGBM installation test](/tests/lightgbm_install_tests.R) |
51 | | -1. [Jupyter Notebook installation tests](/tests/hello_jupyter.ipynb) |
| 245 | +Contributions are welcome! If you spot an issue or have an improvement: |
| 246 | +- Open an issue describing the problem or proposal |
| 247 | +- For changes, fork the repo and open a PR with a concise description and testing notes |
| 248 | +- Keep changes focused and documented; prefer small, reviewable PRs |
0 commit comments