Hi @usnavy13,
I have a small patch (~76 lines, 4 commits) that enables LibreOffice document conversion to work inside the existing nsjail sandbox, plus a couple of related fixes I hit along the way. I wanted to discuss the direction before opening a PR.
Context
LibreOffice is already installed in the upstream image (Dockerfile lines 54-55: libreoffice-impress libreoffice-writer libreoffice-calc libreoffice-common), but invoking soffice --headless --convert-to pdf from the sandboxed Python/Bash runtime currently fails with three distinct errors:
ERROR: /proc not mounted - LibreOffice is unlikely to work well if at all — soffice requires /proc to be visible.
- seccomp
bind syscall blocked — soffice's oosplash and soffice.bin communicate via AF_UNIX sockets internally.
- Font discovery fails —
/etc/fonts and /usr/share/fonts are not bind-mounted into the sandbox, so soffice falls back to a single internal font and produces unreadable output.
The consequence is that the LibreOffice binary ships in the image but is unusable from the sandbox in practice.
What this enhancement would enable
Once these gaps are closed, LibreChat can deliver document-processing skills that shell out to soffice — DOCX → PDF, XLSX formula recalc, PPTX thumbnailing, etc. — without the user shipping a custom image (and the Anthropic-style office skills (docx/pptx/xlsx) are the obvious consumers of course).
Proposed changes
Branch: https://github.com/On-Behalf-AI/LibreCodeInterpreter/tree/enhanced-runtime
Diff vs. usnavy13/main: 9 files changed, ~60 lines added.
It took me 4 commits, but overall, the main additions are :
Commit 1 (On-Behalf-AI@c4573c0) + Commit 4 (On-Behalf-AI@291fe71) — runtime dependencies**
Dockerfile (+2 lines, into the existing apt-install block): add fonts-liberation, fonts-dejavu-core (for Latin-script rendering), qpdf (PDF post-processing).
docker/requirements/python-documents.txt (+3 lines): pypdf, pdfplumber, markitdown[pptx] + remove pdfminer>=20191125, keep only pdfminer.six>=20221105.
The legacy pdfminer package (last release 2019, Python 2 era) and pdfminer.six (its maintained successor) both install to /pdfminer/, so the older one shadows the newer one. This breaks pdfplumber (and any consumer that imports recent symbols) with:
ImportError: cannot import name 'PDFStackT' from 'pdfminer.pdfinterp'
pdfminer.six is the correct dependency for all modern PDF tooling; the legacy pdfminer entry appears to have been an oversight in the initial commit.
docker/requirements/nodejs.txt (+2 lines): docx, pptxgenjs.
Commit 2 — sandbox patches (On-Behalf-AI@51365ef)
docker/nsjail-base.cfg (+26 lines): 3 read-only bind-mounts for /etc/libreoffice, /etc/fonts, /usr/share/fonts.
src/services/sandbox/executor.py: add py/python to the languages with /proc unmasked ({java, rs, bash} → {java, rs, py, python, bash}).
Add XDG_CONFIG_HOME=/tmp/.config so soffice can write its first-run profile.
src/services/sandbox/nsjail.py: allow bind syscall for {py, python, java, bash} ({bash} → {py, python, java, bash}).
src/services/sandbox/pool.py + src/services/programmatic.py: remove the explicit /proc mask so REPL/PTC can invoke soffice too.
Commit 3 — tesseract languages + raised memory limit (On-Behalf-AI@b263215)
Dockerfile: add tesseract-ocr-{fra,deu,spa,ita} language packs. The base tesseract-ocr ships only English; non-English OCR fails silently. The four common Western European packs add ~30 MB to the image and unlock pytesseract.image_to_string(lang="fra+eng") etc.
docker/nsjail-base.cfg: raise rlimit_as from 512 MiB to 1024 MiB. PDF OCR workflows render pages at 200 DPI (pdf2image → tesseract) which can momentarily allocate 400-700 MiB per page on a 4-page multilingual PDF. 512 MiB OOM-killed even with del + gc.collect() between pages; 1024 MiB is sufficient.
Security considerations
The two sandbox relaxations widen what was originally a Bash-only carve-out to other interpreter runtimes (for /proc and for the bind syscall) . Worth being explicit about why this stays acceptable:
/proc visibility: nsjail's PID namespace already restricts /proc to processes inside the sandbox. The only host info that leaks via /proc/cpuinfo and /proc/meminfo was already exposed to Bash users under the existing model — extending it to Python and the REPL doesn't change the threat surface for the trusted-tenant deployment model these languages target.
bind(2) syscall: was blocked to prevent server sockets, but the network namespace isolation (--iface_no_lo in the existing config) already prevents external connections. So allowing AF_UNIX bind — which is what soffice needs — does not re-enable any network reachability path that the kernel-level network isolation hasn't already closed.
The three new bind-mounts (/etc/libreoffice, /etc/fonts, /usr/share/fonts) are all read-only and standard system locations — they expose no user data and no writable surface.
Next : your guidance
=> If you're broadly OK with the direction, I'll open a PR with the same four commits. If it sounds more reasonable to you, I can also consider add build-arg opt-in (e.g. (DOCUMENT_PROCESSING=1) so that users who don't need soffice in-sandbox keep the stricter security envelope.
And many thanks for all the work on this repo — it's been a great foundation for our LibreChat deployment !
Hi @usnavy13,
I have a small patch (~76 lines, 4 commits) that enables LibreOffice document conversion to work inside the existing nsjail sandbox, plus a couple of related fixes I hit along the way. I wanted to discuss the direction before opening a PR.
Context
LibreOffice is already installed in the upstream image (Dockerfile lines 54-55:
libreoffice-impress libreoffice-writer libreoffice-calc libreoffice-common), but invokingsoffice --headless --convert-to pdffrom the sandboxed Python/Bash runtime currently fails with three distinct errors:ERROR: /proc not mounted - LibreOffice is unlikely to work well if at all— soffice requires /proc to be visible.bindsyscall blocked — soffice'soosplashandsoffice.bincommunicate via AF_UNIX sockets internally./etc/fontsand/usr/share/fontsare not bind-mounted into the sandbox, so soffice falls back to a single internal font and produces unreadable output.The consequence is that the LibreOffice binary ships in the image but is unusable from the sandbox in practice.
What this enhancement would enable
Once these gaps are closed, LibreChat can deliver document-processing skills that shell out to
soffice— DOCX → PDF, XLSX formula recalc, PPTX thumbnailing, etc. — without the user shipping a custom image (and the Anthropic-styleofficeskills (docx/pptx/xlsx) are the obvious consumers of course).Proposed changes
Branch: https://github.com/On-Behalf-AI/LibreCodeInterpreter/tree/enhanced-runtime
Diff vs. usnavy13/main: 9 files changed, ~60 lines added.
It took me 4 commits, but overall, the main additions are :
Commit 1 (On-Behalf-AI@c4573c0) + Commit 4 (On-Behalf-AI@291fe71) — runtime dependencies**
Dockerfile(+2 lines, into the existing apt-install block): addfonts-liberation,fonts-dejavu-core(for Latin-script rendering),qpdf(PDF post-processing).docker/requirements/python-documents.txt(+3 lines):pypdf,pdfplumber,markitdown[pptx]+ removepdfminer>=20191125, keep onlypdfminer.six>=20221105.The legacy
pdfminerpackage (last release 2019, Python 2 era) andpdfminer.six(its maintained successor) both install to/pdfminer/, so the older one shadows the newer one. This breakspdfplumber(and any consumer that imports recent symbols) with:pdfminer.sixis the correct dependency for all modern PDF tooling; the legacypdfminerentry appears to have been an oversight in the initial commit.docker/requirements/nodejs.txt(+2 lines):docx,pptxgenjs.Commit 2 — sandbox patches (On-Behalf-AI@51365ef)
docker/nsjail-base.cfg(+26 lines): 3 read-only bind-mounts for/etc/libreoffice,/etc/fonts,/usr/share/fonts.src/services/sandbox/executor.py: addpy/pythonto the languages with/procunmasked ({java, rs, bash}→{java, rs, py, python, bash}).Add
XDG_CONFIG_HOME=/tmp/.configso soffice can write its first-run profile.src/services/sandbox/nsjail.py: allowbindsyscall for{py, python, java, bash}({bash}→{py, python, java, bash}).src/services/sandbox/pool.py+src/services/programmatic.py: remove the explicit/procmask so REPL/PTC can invoke soffice too.Commit 3 — tesseract languages + raised memory limit (On-Behalf-AI@b263215)
Dockerfile: addtesseract-ocr-{fra,deu,spa,ita}language packs. The basetesseract-ocrships only English; non-English OCR fails silently. The four common Western European packs add ~30 MB to the image and unlockpytesseract.image_to_string(lang="fra+eng")etc.docker/nsjail-base.cfg: raiserlimit_asfrom 512 MiB to 1024 MiB. PDF OCR workflows render pages at 200 DPI (pdf2image→ tesseract) which can momentarily allocate 400-700 MiB per page on a 4-page multilingual PDF. 512 MiB OOM-killed even withdel+gc.collect()between pages; 1024 MiB is sufficient.Security considerations
The two sandbox relaxations widen what was originally a Bash-only carve-out to other interpreter runtimes (for /proc and for the bind syscall) . Worth being explicit about why this stays acceptable:
/procvisibility: nsjail's PID namespace already restricts /proc to processes inside the sandbox. The only host info that leaks via/proc/cpuinfoand/proc/meminfowas already exposed to Bash users under the existing model — extending it to Python and the REPL doesn't change the threat surface for the trusted-tenant deployment model these languages target.bind(2)syscall: was blocked to prevent server sockets, but the network namespace isolation (--iface_no_loin the existing config) already prevents external connections. So allowing AF_UNIXbind— which is what soffice needs — does not re-enable any network reachability path that the kernel-level network isolation hasn't already closed.The three new bind-mounts (
/etc/libreoffice,/etc/fonts,/usr/share/fonts) are all read-only and standard system locations — they expose no user data and no writable surface.Next : your guidance
=> If you're broadly OK with the direction, I'll open a PR with the same four commits. If it sounds more reasonable to you, I can also consider add build-arg opt-in (e.g. (
DOCUMENT_PROCESSING=1) so that users who don't need soffice in-sandbox keep the stricter security envelope.And many thanks for all the work on this repo — it's been a great foundation for our LibreChat deployment !