Skip to content

Releases: NetApp/Innovation-Labs

NetApp Project Neo v4.1.4

20 May 13:09
02137de

Choose a tag to compare

NetApp NEO 4.1.4

Container Images

Service Image Pull Command
API ghcr.io/netapp/netapp-neo-api docker pull ghcr.io/netapp/netapp-neo-api:4.1.4
Worker ghcr.io/netapp/netapp-neo-worker docker pull ghcr.io/netapp/netapp-neo-worker:4.1.4
Extractor (CPU) ghcr.io/netapp/netapp-neo-extractor-full docker pull ghcr.io/netapp/netapp-neo-extractor-full:4.1.4
Extractor (CUDA) ghcr.io/netapp/netapp-neo-extractor-cuda-full docker pull ghcr.io/netapp/netapp-neo-extractor-cuda-full:4.1.4
Extractor (ROCm) ghcr.io/netapp/netapp-neo-extractor-rocm-full docker pull ghcr.io/netapp/netapp-neo-extractor-rocm-full:4.1.4
NER ghcr.io/netapp/netapp-neo-ner-full docker pull ghcr.io/netapp/netapp-neo-ner-full:4.1.4

All images are available for linux/amd64 and linux/arm64 (except CUDA and ROCm variants which are amd64 only).

Quick Start

  1. Download the deployment ZIP from this release
  2. Extract and configure your environment:
    unzip netapp-neo-4.1.4.zip
    cp .env.example .env
    # Edit .env with your settings (see comments for required values)
  3. Start all services:
    docker compose up -d
  4. Access the web console at http://localhost:8081
  5. API available at http://localhost:8000

GPU Support

For NVIDIA CUDA acceleration (extractor + NER):

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d

For AMD ROCm GPUs, edit docker-compose.gpu.yml and change the extractor image to:

ghcr.io/netapp/netapp-neo-extractor-rocm-full:4.1.4

What's New

Pre-baked ML Model Images

  • Full image variants -- Extractor and NER images now ship with all ML models pre-baked into the container, eliminating the need for runtime model downloads. This significantly reduces first-start time and enables fully air-gapped deployments.

Capacity-Based Licensing

  • TB-based license limits -- License keys can now enforce storage capacity limits, enabling more granular entitlement management per deployment.

OCR Engine Fallback

  • Automatic OCR fallback -- When the primary OCR engine returns no content for a document, the extractor now automatically falls back to alternate OCR engines, improving extraction success rates for difficult documents.

Extractor Queue Enhancements

  • Improved queue throughput -- Extraction work queue has been optimized with better tail handling, reducing idle time and improving overall extraction throughput on large crawl jobs.
  • Failed extraction handling -- Work items that return no content after extraction are now explicitly marked as failed, giving operators clear visibility into extraction issues.

Stability & Reliability

  • Event-loop starvation fixes -- Resolved multiple scenarios where long-running database operations could starve the async event loop, causing liveness probe failures and worker restarts. Database calls in the work queue and ACL backfill paths are now offloaded to the thread pool.
  • Database locking improvements -- Read-only queries now properly commit to prevent PostgreSQL idle-in-transaction locking. Index creation retries on lock timeouts during startup. Stale service coordination records are cleaned up during maintenance windows.
  • OpenShift compatibility -- Fixes for running Neo in OpenShift-managed Kubernetes environments.

Dependency Updates

  • Docling updated to v2.92.0 with improved document parsing and layout detection.

Bug Fixes

  • Fixed incorrect column reference in user listing query.
  • Fixed SQL pattern escaping in file count estimation.
  • Standardised user table schema for consistent password hashing.

netapp-neo-26.4.4

30 Apr 14:42
02137de

Choose a tag to compare

NetApp Neo v4.x — context lake microservice architecture for AI services via MCP service

NetApp Project Neo v4.1.2

21 Apr 22:02
595ebee

Choose a tag to compare

NetApp Project Neo v4.1.2

Scale-performance release. All nine changes are additive at the API contract level — defaults preserve 4.1.0/4.1.1 behaviour; new parameters are opt-in. Response-model fields become optional to support low-overhead modes. Validated against a 2.099 billion-row file_metadata dataset across 2,100 LIST partitions.

What's New

New API parameters

  • ?after_modified_time=<ISO-8601> on /api/v1/shares/{share_id}/files and /api/v1/files — keyset pagination ordered by modified_time DESC. Returns rows strictly older than the cursor; responses include next_cursor to walk further. ~300× faster than OFFSET at 1 B rows, flat as the dataset grows. Prefer this for any listing where the page number is meaningful at scale.
  • ?include_counts=false on the same endpoints — skips the cross-partition COUNT(*) and SUM(size) aggregates that previously ran on every request. When omitted, unfiltered listings now use a fast n_live_tup partition-stats estimate for total_count and omit total_size. Filtered listings preserve exact counts (slow path).
  • ?limit= and ?offset= on /api/v1/shares — previously silently ignored; now honoured. The default (no params) still returns the full list for backward compatibility; multi-tenant deployments should cap explicitly.

API runtime

  • UVICORN_WORKERS environment variable on the API service — configures uvicorn --workers N from deployment values. Default is 1 (matches 4.1.0/4.1.1). Bump to 4 for read-heavy deployments; each worker is an independent Python process with its own DB pool, so scaling comes at the cost of N× memory.

Companion chart release

Helm chart updates ship alongside on NetApp/Innovation-Labs: sensible API resource defaults, nodeSelector / tolerations / extraArgs support on the Postgres StatefulSet, and a chart-default idle_in_transaction_session_timeout=60s on Postgres that works together with the application-side lock_timeout to keep share-creation latency bounded.

Container Images

All images are available at ghcr.io/netapp. Pull with docker pull ghcr.io/netapp/<image>:4.1.2.

Image Platforms Description
netapp-neo-api:4.1.2 amd64, arm64 REST API + MCP transport
netapp-neo-worker:4.1.2 amd64, arm64 Background processing (crawl, ACL, upload)
netapp-neo-extractor:4.1.2 amd64, arm64 Content extraction (CPU)
netapp-neo-extractor-cuda:4.1.2 amd64, arm64 Content extraction (NVIDIA GPU)
netapp-neo-extractor-rocm:4.1.2 amd64 Content extraction (AMD GPU)
netapp-neo-ner:4.1.2 amd64, arm64 Named Entity Recognition (GLiNER2)

Quick Start

docker pull ghcr.io/netapp/netapp-neo-api:4.1.2
docker pull ghcr.io/netapp/netapp-neo-worker:4.1.2
docker pull ghcr.io/netapp/netapp-neo-extractor:4.1.2
docker pull ghcr.io/netapp/netapp-neo-ner:4.1.2

Use the attached docker-compose.yml (or docker-compose.gpu.yml for NVIDIA) plus .env.example to bring up a local deployment. For multi-billion-file deployments, also tune Postgres via postgresql.extraArgs in the companion Helm chart (see PERFORMANCE_TUNING_GUIDE).

Upgrade notes

  • From 4.1.0 / 4.1.1: drop-in replacement; no data migration required. The new idx_file_metadata_modified_time is created idempotently at startup.
  • Clients that read total_count / total_size / total_pages: these fields are now Optional[int]. They remain populated with the exact value on 4.1.0/4.1.1 callers (who don't pass the new query params); they are null when the caller opts into ?include_counts=false.
  • Scale recommendation: for shares with more than ~100K files, migrate paginated file listing to keyset (?after_modified_time=). OFFSET remains supported but degrades linearly with offset.

Quality

  • 2.099 billion-row validation across 2,100 LIST partitions on a tuned Postgres 17 deployment (K3s, 2 TB NVMe-backed PVC, shared_buffers=32 GB)
  • Read-side SLOs at 2 B: file_by_id 37 ms p95, keyset listing 18 ms p95, cross_share_agg 63 ms p95, share_listing 14 ms p95
  • Chaos recovery at 2 B: graceful Postgres pod kill 6 s, force kill 13 s, API pod kill 14 s — WAL-bounded, not data-volume-bounded
  • Cascade share-delete at 1 M-row / 776 MB partition: 5.98 s clean with DROP … CASCADE (pre-fix: silent partition leak)
  • All unit and integration suites pass on CI

netapp-neo-26.4.3

21 Apr 15:59
595ebee

Choose a tag to compare

NetApp Neo v4.x — context lake microservice architecture for AI services via MCP service

netapp-neo-26.4.2

21 Apr 13:12
8ce41fb

Choose a tag to compare

NetApp Neo v4.x — context lake microservice architecture for AI services via MCP service

NetApp Project Neo v4.1.0

20 Apr 17:28

Choose a tag to compare

NetApp Project Neo v4.1.0

A maintenance and capability release focused on enterprise scale, MCP/dataset features, and operational reliability. Drop-in upgrade from 4.0.x.

What's New

Datasets, MCP, and Search

  • Virtual datasets — query files by share, schedule, type, or pattern without persisting an enumeration. Includes cursor pagination and rollup aggregates for billion-scale deployments.
  • count_entity_mentions MCP tool — fast aggregated entity-mention counts across files, served from ner_entity_aggregates rollups instead of scanning raw entities.
  • ~20× faster full-text searchts_headline is now deferred until after LIMIT, eliminating the dominant cost on result pages with snippets.
  • /api/v1/files filename filter restored — silently ignored since a refactor; now honoured as documented.

Performance & Scale

  • Batched NER aggregate upserts — single statement per file replaces per-row upserts; large reductions in DB round-trips during NER fan-in.
  • Partition lifecycle on share deletener_entities and ner_entity_aggregates partitions are dropped alongside file partitions, preventing schema bloat in long-lived deployments.
  • FTS retry window — extends FTS readiness checks during large bulk loads so initial indexing doesn't time out.

Crawl & Worker Reliability

  • VARCHAR(50) overflow on long file extensions — fixed; some Office/legacy extensions now stored without truncation errors during crawl.
  • Manual crawl_schedule no longer triggers cron parse error — manual schedules are detected and bypass the cron parser.
  • Worker env var names aligned with docker-compose.yml and docs — NUM_*_WORKERS set in compose are honoured by the worker service.
  • NFS host guidance updated in shipped configs — clearer separation of in-cluster vs. host-mounted NFS configurations.

Sizing & Operations

  • Sizing API exposes ACL/NER worker counts — deployment sizing recommendations now include the full set of horizontally-scalable worker pools.
  • Per-protocol E2E pipelines — focused tests added for SMB, NFS, and S3 paths with cascade-cleanup verification for share deletion.

Bug Fixes

  • NER stats endpoint returned zero when share_id was supplied — SQL syntax bug fixed.
  • Test refresh for procedural drift (worker health via exec, FTS page_size, content field handling).
  • NER engine tests updated for the counted-entity response format.

Build & Dependencies

  • Security floors hardened across pyproject.toml; PyJWT removed where unused.
  • Tier 1 + Tier 2 dependency bumps for security and freshness; floors harmonised with root requirements pins.
  • --internal build flag for non-public image builds.
  • arm64 CUDA builds skip onnxruntime-gpu (not published for that platform).

Container Images

All images are available at ghcr.io/netapp. Pull with docker pull ghcr.io/netapp/<image>:4.1.0.

Image Platforms Description
netapp-neo-api:4.1.0 amd64, arm64 REST API + MCP transport
netapp-neo-worker:4.1.0 amd64, arm64 Background processing (crawl, ACL, upload)
netapp-neo-extractor:4.1.0 amd64, arm64 Content extraction (CPU)
netapp-neo-extractor-cuda:4.1.0 amd64, arm64 Content extraction (NVIDIA GPU)
netapp-neo-extractor-rocm:4.1.0 amd64 Content extraction (AMD GPU)
netapp-neo-ner:4.1.0 amd64, arm64 Named Entity Recognition (GLiNER2)

Quick Start

# Pull all core images
docker pull ghcr.io/netapp/netapp-neo-api:4.1.0
docker pull ghcr.io/netapp/netapp-neo-worker:4.1.0
docker pull ghcr.io/netapp/netapp-neo-extractor:4.1.0
docker pull ghcr.io/netapp/netapp-neo-ner:4.1.0

Use the attached docker-compose.yml (or docker-compose.gpu.yml for NVIDIA) and .env.example to bring up a local deployment.

Quality

  • All Unit Tests + Test Suite runs green on release/4.1.0 at tag time.
  • E2E pipelines validated across NFS, SMB, and S3 protocols, including cascade-cleanup of share deletion.

netapp-neo-26.4.1

16 Apr 06:37

Choose a tag to compare

NetApp Neo v4.x — context lake microservice architecture for AI services via MCP service

netapp-neo-26.3.2

10 Apr 12:50
c15ca00

Choose a tag to compare

NetApp Neo v4.x — context lake microservice architecture for AI services via MCP service

NetApp Project Neo v4.0.3p9

07 Apr 13:47
612e53b

Choose a tag to compare

NetApp Project Neo v4.0.3p9

What's New

Improved Service Resilience

Admin user creation is now decoupled from worker initialization, ensuring authentication always works even if background workers fail to start. Worker initialization also now automatically retries on failure instead of leaving the service in a broken state.

  • Independent admin account creation -- The admin user is created as a standalone step before worker components initialize, so API authentication is available immediately after setup completes
  • Automatic worker retry -- If worker initialization fails (e.g., due to a transient Graph API or database issue), the service automatically retries instead of requiring a manual restart

MCP & Search Fixes

Resolves multiple issues with the Model Context Protocol (MCP) integration, ACL-based access control, and NER entity search.

  • ACL filtering fix -- Shares configured with acl_override_mode=everyone now correctly grant access instead of denying when resolved principals don't match the user
  • Auth persistence -- MCP OAuth RSA signing keys are now persisted to the database, so authentication tokens survive service restarts
  • Group-based access control -- User group memberships are now fetched via Microsoft Graph at token validation time, enabling group-based ACL matching through MCP
  • NER search improvements -- Fixed entity search 422 error, added relevance ranking (exact match, entity density, text length), pagination support, and per-file deduplication
  • Share status transitions -- NER worker now correctly transitions share status from PROCESSING → READY when all work completes
  • OAuthProvider abstraction -- Introduced OAuthProvider ABC for future Keycloak/generic OIDC provider support

Container Images

All images are available at ghcr.io/netapp. Pull with docker pull ghcr.io/netapp/<image>:4.0.3p9.

Image Platforms Description
netapp-neo-api:4.0.3p9 amd64, arm64 REST API + MCP transport
netapp-neo-worker:4.0.3p9 amd64, arm64 Background processing
netapp-neo-extractor:4.0.3p9 amd64, arm64 Content extraction (CPU)
netapp-neo-extractor-cuda:4.0.3p9 amd64, arm64 Content extraction (NVIDIA GPU)
netapp-neo-extractor-rocm:4.0.3p9 amd64 Content extraction (AMD GPU)
netapp-neo-ner:4.0.3p9 amd64, arm64 Named Entity Recognition

Quick Start

docker pull ghcr.io/netapp/netapp-neo-api:4.0.3p9
docker pull ghcr.io/netapp/netapp-neo-worker:4.0.3p9
docker pull ghcr.io/netapp/netapp-neo-extractor:4.0.3p9
docker pull ghcr.io/netapp/netapp-neo-ner:4.0.3p9

Quality

  • Full end-to-end testing passed on both CPU and GPU (CUDA) builds
  • Validated across S3, NFS, and SMB storage backends
  • 1,467+ files processed with NER entity detection (67,000+ entities on CPU, 8,700+ on GPU)
  • Zero import errors across all Cython-compiled services
  • Zero CUDA errors on NVIDIA RTX PRO 4000 Blackwell SFF

NetApp Project Neo v4.0.3p10

09 Apr 07:54
612e53b

Choose a tag to compare

NetApp Project Neo v4.0.3p10

What's New

Fix: Worker Startup Hang on Large Datasets

Resolved a critical issue where all worker containers would hang indefinitely during initialization on systems with large file inventories (100k+ files), preventing all data ingestion, ACL resolution, and file processing.

  • Root cause -- The ACL resolution backfill query used a correlated LIKE subquery on cast JSON text (metadata::text LIKE '%' || id || '%'), resulting in O(n*m) complexity that could take hours on large datasets. With all worker replicas running this query simultaneously, database contention compounded the problem.
  • Fix -- The backfill is now deferred to a non-blocking background task that runs after workers are fully initialized. The query has been rewritten to use an efficient JSONB key lookup ((metadata::jsonb)->>'file_id') that is indexable and orders of magnitude faster.
  • Impact -- Workers now start in seconds regardless of dataset size, immediately beginning file processing, ACL resolution, and Graph uploads.

Fix: Admin User Creation Decoupled from Worker Init

Admin user creation now runs independently of worker initialization with retry logic, ensuring authentication works even if worker startup encounters transient errors.

Container Images

All images are available at ghcr.io/netapp. Pull with docker pull ghcr.io/netapp/<image>:4.0.3p10.

Image Platforms Description
netapp-neo-api:4.0.3p10 amd64, arm64 REST API + MCP transport
netapp-neo-worker:4.0.3p10 amd64, arm64 Background processing
netapp-neo-extractor:4.0.3p10 amd64, arm64 Content extraction (CPU)
netapp-neo-extractor-cuda:4.0.3p10 amd64, arm64 Content extraction (NVIDIA GPU)
netapp-neo-extractor-rocm:4.0.3p10 amd64 Content extraction (AMD GPU)
netapp-neo-ner:4.0.3p10 amd64, arm64 Named Entity Recognition

Quick Start

docker pull ghcr.io/netapp/netapp-neo-api:4.0.3p10
docker pull ghcr.io/netapp/netapp-neo-worker:4.0.3p10
docker pull ghcr.io/netapp/netapp-neo-extractor:4.0.3p10
docker pull ghcr.io/netapp/netapp-neo-ner:4.0.3p10

Quality

  • 1940/1940 end-to-end test work items passing (100% pass rate)
  • Validated across SMB and NFS storage backends (CPU and GPU builds)
  • ACL backfill verified: 10/10 manually-cleared files re-resolved after worker restart