Restructure embedeer as a single-package repo with @jsilvanus/embedeer and inline GPU provider detection#2
Conversation
Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/898517f6-b6e8-43fa-98b5-5a16e9745dc5 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/898517f6-b6e8-43fa-98b5-5a16e9745dc5 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
|
@copilot next, please work on CUDA that we have a working gpu accelerated mvp. |
…b checks Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/7227830d-3d7c-4707-be90-545f53b4bc77 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/7227830d-3d7c-4707-be90-545f53b4bc77 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Here's the CUDA MVP — commit Key discovery: What was implemented:
To use GPU on Linux (after this PR): # Install CUDA 12 + cuDNN 9 (Ubuntu)
sudo apt install cuda-toolkit-12-6 libcudnn9-cuda-12
# Install packages
npm install embedeer
npm install @embedeer/ort-linux-x64-cuda
# Run with GPU
npx embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"The |
|
@copilot some user might want to run this in docker with CUDA. Enabled there. Are we go already on that or do we need something else? |
…r Toolkit guide) Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/e005d328-62aa-468f-90b9-8982c069d6ff Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Yes, Docker with CUDA already works with the current implementation — no code changes needed. Here's why it works automatically:
Commit FROM nvidia/cuda:12.6.3-cudnn9-runtime-ubuntu24.04
# ... install Node.js ...
RUN npm install embedeer && npm install @embedeer/ort-linux-x64-cudadocker run --rm --gpus all my-app \
npx embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"The only host requirement is NVIDIA Container Toolkit installed alongside the NVIDIA driver. |
|
@copilot remove embedeer/ort-win32-x64-cuda (placeholder) This package will be @jsilvanus/embedeer and other packages scoped there as well. |
…vanus on all packages Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/aba1be69-ab33-480c-9457-d0579962b5a4 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Done in commit
|
|
@copilot please prefix ort packages with embedeer-. Prepare this for npm publishing. Make sure everything is ready for it. |
…ishing; show npx --device auto/gpu in READMEs Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/c6cb957d-1635-42ac-9914-88507015c38b Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Done in commit Package renames:
npm publish readiness (all three packages):
npx + CUDA docs (new requirement): Updated |
…ve I/O docs Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/be63b249-589e-433f-bf10-b78ef1668bc1 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Done in commit New input option — configurable record delimiter (
# Null-byte delimited (safe with text containing newlines — compatible with find -print0)
printf 'Hello\0World\0' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '\0'
# Tab-delimited file
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --file data.tsv --delimiter '\t'
# Custom multi-char delimiter
printf 'foo|||bar|||baz' | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --delimiter '|||'New output formats:
# JSONL — pipe to jq
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --output jsonl --data "foo" "bar" | jq '.text'
# CSV — for pandas
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --file texts.txt --output csv --dump out.csv
# JSON with source text
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --output json --with-text --data "cat" "dog"Both READMEs now have dedicated Input Sources, Configurable delimiter, Output Formats, and Piping examples sections. 19 new tests cover all new paths. |
|
@copilot I also want option that a line reader (??) opens and you can just paste some data and it will, when seeing data come in, start work (batch maybe also?), and output (probably to a file when input reader is used....) |
…treaming output Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/b3d9aa41-2047-43a4-be9a-96e4015529eb Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Done in commit # Interactive terminal — paste lines, Ctrl+D when done
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --interactive --dump embeddings.jsonl
# Stream a file in batches (no terminal needed)
cat corpus.txt | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 -i --output csv --dump out.csv
# GPU, custom batch size
npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --interactive --device auto \
--batch-size 16 --output jsonl --dump out.jsonlHow flushing works:
|
|
@copilot fix any remaining errors. |
…sions Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/2e60c137-0653-4608-ba14-5926c7456201 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Fixed in commit Root cause: Fix: Added All 80 tests now pass ( |
|
@copilot Do we need the embedeer-ort* packages at all, if everything comes in the basic deps already? |
Technically no — the ort-* packages ship zero new binaries.
Why the two-package pattern was added: We could simplify by:
Want me to go ahead with that simplification? It would reduce the repo to a single publishable package ( |
|
@copilot Yes: delete both ort packages and move the system checks inline into provider-loader.js. Convert this back to a normal repo, not monorepo anymore, since this changes things. |
Agent-Logs-Url: https://github.com/jsilvanus/embedeer/sessions/6286b261-52f2-4560-ad02-795169949161 Co-authored-by: jsilvanus <22452468+jsilvanus@users.noreply.github.com>
Done in commit
|
@jsilvanus/embedeer and inline GPU provider detection
|
@copilot Are PR description and README up-to-date? |
Yes, both are accurate and current: README correctly describes the single-package structure: PR description was already updated in the previous commit to reflect the single-package repo with inline GPU system checks. One small note: |
Converts the repo from an npm workspaces monorepo back to a standard single-package repository (
@jsilvanus/embedeer) and replaces the optional sidecar ort packages with inline GPU system checks. The package is ready for npm publishing.Structure
Single package at the repo root:
Install
npm install @jsilvanus/embedeer # CPU always works; GPU auto-detected if system libs presentNo separate GPU sidecar packages needed.
onnxruntime-node(already bundled via@huggingface/transformers) shipslibonnxruntime_providers_cuda.soon Linux x64 and DirectML on Windows — no extra install required.New
device/providerAPIAdded to
Embedder.create(),WorkerPool, and CLI (--device,--provider):Provider selection logic (
src/provider-loader.js)All system checks are inlined — no dynamic package imports:
device='auto': tries platform-default providers (Linux x64 →cuda; Windows x64 →dml), silently falls back to CPUdevice='gpu': throws with a diagnostic error (lists missing CUDA libs or unsupported platform) if no GPU provider succeedsdevice='cpu'/provider='cpu': always CPUprovider=cuda|dml: runs inline checks or throws with a diagnostic error; unsupported platform surfaces "not supported on …" messageLinux x64 — CUDA 12 inline checks
provider-loader.jschecks:/dev/nvidiactlexists (NVIDIA driver present)ldconfig -p:libcudart.so.12,libcublas.so.12,libcublasLt.so.12,libcurand.so.10,libcufft.so.11,libcudnn.so.9apt installinstructions if anything is missingSystem requirements: NVIDIA GPU + driver ≥ 525, CUDA 12 toolkit, cuDNN 9.
Docker + NVIDIA CUDA
Works out of the box with NVIDIA Container Toolkit:
/dev/nvidiactlis mounted into the container automaticallyLD_LIBRARY_PATHto paths already in the default search listdocker run --rm --gpus all my-app \ npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --device gpu --data "Hello GPU"Host requirement: NVIDIA Container Toolkit.
Windows x64 — DirectML inline checks
DirectML is bundled with
onnxruntime-nodeon Windows.provider-loader.jsjust verifiesprocess.platform === 'win32'. Supports NVIDIA, AMD, and Intel GPUs on Windows 10 (1903+) / Windows 11.Workers
Both
worker.js(child process) andthread-worker-script.js(thread) callresolveProvider(device, provider)beforepipeline().deviceandproviderflow throughWorkerPool→workerData.npm publish readiness
"publishConfig": { "access": "public" }(required for scoped packages)"files"array publishes onlysrc/+README.md"engines": { "node": ">=18" }"author": "jsilvanus"Thorough I/O and piping (
src/cli.js)Configurable record delimiter (
-D/--delimiter <str>)Output formats
json(default)json --with-text{"text","embedding"}objectsjsonljq, streamingcsvtext,dim_0,...,dim_Ntxttxt --with-text<text>\t<floats>sqlINSERT INTO embeddings ...Interactive line-reader mode (
-i/--interactive)npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 --interactive --dump embeddings.jsonl cat corpus.txt | npx @jsilvanus/embedeer --model Xenova/all-MiniLM-L6-v2 -i --output csv --dump out.csvTests
Tests in
test/provider-loader.test.jscover platform defaults,tryLoadProviderfailure modes (no NVIDIA device, missing CUDA libs), and allresolveProviderpaths. Tests intest/cli-format.test.jscoverparseDelimiter,parseTexts, and allformatOutputpaths. All tests pass.Original prompt
Create a monorepo for jsilvanus/embedeer that supports optional GPU providers installable via npm in a two-step install, and update the main embedeer package to dynamically load the appropriate provider at runtime.
Context (current state in PR #1 branch copilot/create-embedding-tool):
Goal:
Convert repository into a monorepo (npm workspaces) with:
Two-step install is acceptable:
Runtime selection / API changes in embedeer:
Provider packages responsibilities:
Ensure the worker modes still work:
Tests:
CI/build:
Output:
The following is the prior conversation context from the user's chat exploration (may be truncated):
User: See its pr 1
Assistant: You’ve already got the key piece for GPU support in PR #1: you’re using
@huggingface/transformers, and that in turn uses ONNX Runtime (onnxruntime-nodeis in your lockfile). So the GPU plan is “make ONNX Runtime use a GPU execution provider” (CUDA/DirectML/CoreML/etc.), not “write shaders”.What PR #1 is doing today (relevant to GPU)
src/worker.js,src/thread-worker-script.js) call:pipeline('feature-extraction', modelName, { dtype })extractor(texts, { pooling, normalize })The practical way to add GPU support here
Step 1: Add a
--device/deviceoption, propagate it into workerDataAdd
device: 'auto'|'cpu'|'cuda'(start with those; you can extend later).Where to thread it through:
src/cli.jsadd--devicesrc/embedder.jsacceptdevice, pass to WorkerPoolsrc/worker-pool.jsincludedeviceinworkerDatasrc/worker.jsandsrc/thread-worker-script.jsreaddeviceand configure runtime before creating the pipelineStep 2: Ensure ONNX Runtime with CUDA is actually available
This is the hardest part in Node land. There are typically two workable approaches:
A) “Bring your own CUDA ORT” via container (recommended)
device=cuda; if CUDA isn’t present it falls back to CPU.B) Make GPU optional via an extra dependency
This pull request was created from Copilot chat.