[Enhancement] Enable LibreOffice document conversion in the sandbox

Hi @usnavy13,

I have a small patch (~76 lines, 4 commits) that enables LibreOffice document conversion to work inside the existing nsjail sandbox, plus a couple of related fixes I hit along the way. I wanted to discuss the direction before opening a PR.

## Context

LibreOffice is already installed in the upstream image (Dockerfile lines 54-55: `libreoffice-impress libreoffice-writer libreoffice-calc libreoffice-common`), but invoking `soffice --headless --convert-to pdf` from the sandboxed Python/Bash runtime currently fails with three distinct errors:

1. `ERROR: /proc not mounted - LibreOffice is unlikely to work well if at all`   — soffice requires /proc to be visible.
2. seccomp `bind` syscall blocked — soffice's `oosplash` and `soffice.bin` communicate via AF_UNIX sockets internally.
3. Font discovery fails — `/etc/fonts` and `/usr/share/fonts` are not bind-mounted into the sandbox, so soffice falls back to a single internal font and produces unreadable output.

The consequence is that the LibreOffice binary ships in the image but is unusable from the sandbox in practice.

## What this enhancement would enable

Once these gaps are closed, LibreChat can deliver document-processing skills that shell out to `soffice` — DOCX → PDF, XLSX formula recalc, PPTX thumbnailing, etc. — without the user shipping a custom image (and the Anthropic-style `office` skills (docx/pptx/xlsx) are the obvious consumers of course).

## Proposed changes

Branch: https://github.com/On-Behalf-AI/LibreCodeInterpreter/tree/enhanced-runtime
Diff vs. usnavy13/main: 9 files changed, ~60 lines added.

It took me 4 commits, but overall, the main additions are : 
Commit 1 (https://github.com/On-Behalf-AI/LibreCodeInterpreter/commit/c4573c0) + Commit 4 (https://github.com/On-Behalf-AI/LibreCodeInterpreter/commit/291fe71) — runtime dependencies** 
- `Dockerfile` (+2 lines, into the existing apt-install block): add  `fonts-liberation`, `fonts-dejavu-core` (for Latin-script rendering), `qpdf` (PDF post-processing).
- `docker/requirements/python-documents.txt` (+3 lines): `pypdf`, `pdfplumber`, `markitdown[pptx]` + remove `pdfminer>=20191125`, keep only `pdfminer.six>=20221105`.
  The legacy `pdfminer` package (last release 2019, Python 2 era) and `pdfminer.six` (its maintained successor) both install to `/pdfminer/`, so the older one shadows the newer one. This breaks `pdfplumber` (and any consumer that imports recent symbols) with:
```
ImportError: cannot import name 'PDFStackT' from 'pdfminer.pdfinterp'
```
  `pdfminer.six` is the correct dependency for all modern PDF tooling; the legacy `pdfminer` entry appears to have been an oversight in the initial commit.
- `docker/requirements/nodejs.txt` (+2 lines): `docx`, `pptxgenjs`.


**Commit 2 — sandbox patches** (https://github.com/On-Behalf-AI/LibreCodeInterpreter/commit/51365ef)
- `docker/nsjail-base.cfg` (+26 lines): 3 read-only bind-mounts for `/etc/libreoffice`, `/etc/fonts`, `/usr/share/fonts`.
- `src/services/sandbox/executor.py`: add `py`/`python` to the languages with `/proc` unmasked (`{java, rs, bash}` → `{java, rs, py, python, bash}`).
  Add `XDG_CONFIG_HOME=/tmp/.config` so soffice can write its first-run profile.
- `src/services/sandbox/nsjail.py`: allow `bind` syscall for `{py, python, java, bash}` (`{bash}` → `{py, python, java, bash}`).
- `src/services/sandbox/pool.py` + `src/services/programmatic.py`: remove the explicit `/proc` mask so REPL/PTC can invoke soffice too.

**Commit 3 — tesseract languages + raised memory limit** (https://github.com/On-Behalf-AI/LibreCodeInterpreter/commit/b263215)
- `Dockerfile`: add `tesseract-ocr-{fra,deu,spa,ita}` language packs. The base `tesseract-ocr` ships only English; non-English OCR fails silently. The four common Western European packs add ~30 MB to the image and unlock `pytesseract.image_to_string(lang="fra+eng")` etc.
- `docker/nsjail-base.cfg`: raise `rlimit_as` from 512 MiB to 1024 MiB. PDF OCR workflows render pages at 200 DPI (`pdf2image` → tesseract) which can momentarily allocate 400-700 MiB per page on a 4-page multilingual PDF. 512 MiB OOM-killed even with `del` + `gc.collect()` between pages; 1024 MiB is sufficient.


## Security considerations

The two sandbox relaxations widen what was originally a Bash-only carve-out to other interpreter runtimes (for /proc and for the bind syscall) . Worth being explicit about why this stays acceptable:

1. **`/proc` visibility**: nsjail's PID namespace already restricts /proc to processes inside the sandbox. The only host info that leaks via `/proc/cpuinfo` and `/proc/meminfo` was already exposed to Bash users under the existing model — extending it to Python and the REPL doesn't change the threat surface for the trusted-tenant deployment model these languages target.
2. **`bind(2)` syscall**: was blocked to prevent server sockets, but the network namespace isolation (`--iface_no_lo` in the existing config) already prevents external connections. So allowing AF_UNIX `bind` — which is what soffice needs — does not re-enable any network reachability path that the kernel-level network isolation hasn't already closed.

The three new bind-mounts (`/etc/libreoffice`, `/etc/fonts`, `/usr/share/fonts`) are all read-only and standard system locations — they expose no user data and no writable surface.

## Next : your guidance

=> If you're broadly OK with the direction, I'll open a PR with the same four commits. If it sounds more reasonable to you, I can also consider add build-arg opt-in (e.g. (`DOCUMENT_PROCESSING=1`) so that users who don't need soffice in-sandbox keep the stricter security envelope. 

And many thanks for all the work on this repo — it's been a great foundation for our LibreChat deployment !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Enable LibreOffice document conversion in the sandbox #116

Context

What this enhancement would enable

Proposed changes

Security considerations

Next : your guidance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Enhancement] Enable LibreOffice document conversion in the sandbox #116

Description

Context

What this enhancement would enable

Proposed changes

Security considerations

Next : your guidance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions