Skip to content

feat: Add Centralized FileService Functionality - BED-8307#2794

Open
mykeelium wants to merge 26 commits into
mainfrom
mcuomo/BED-8307
Open

feat: Add Centralized FileService Functionality - BED-8307#2794
mykeelium wants to merge 26 commits into
mainfrom
mcuomo/BED-8307

Conversation

@mykeelium
Copy link
Copy Markdown
Contributor

@mykeelium mykeelium commented May 19, 2026

Description

This change builds out a more robust file service system that BHCE can use. In doing so a central interface has been created so that extensions can be built on this service. In addition a FileServiceResolver is added so at different parts of the system, a developer can resolve the necessary FileService for their functionality.

Motivation and Context

Resolves BED-8307

How Has This Been Tested?

This has been tested by running the application and ensuring the existing functionality of files is supported. In addition, tests have been created to document and guard both new and updated functionality.

Screenshots (optional):

Types of changes

  • Chore (a change that does not modify the application functionality)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Summary by CodeRabbit

  • New Features

    • Implemented storage abstraction layer enabling unified file operations across ingest, retained, and collector file locations
  • Improvements

    • Enhanced ingest file retention with dedicated storage service and streamlined file retrieval workflows
    • Strengthened orphan file cleanup with time-based pruning, recursive directory handling, and improved path validation
    • Refined temporary file management and validation for ingest processing with concurrent validation support

Review Change Stack

@mykeelium mykeelium self-assigned this May 19, 2026
@mykeelium mykeelium added the enhancement New feature or request label May 19, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive storage abstraction layer to replace direct filesystem access with a resolver-based file service pattern. The refactoring threads storage dependencies through bootstrap initialization, updates upload and ingest processing pipelines, refactors retained file and orphan cleanup operations, and wires file service resolvers through API and daemon components.

Changes

Storage Layer Integration

Layer / File(s) Summary
Storage abstraction and LocalStore implementation
cmd/api/src/services/storage/storage.go, cmd/api/src/services/storage/localstore.go, cmd/api/src/services/storage/localstore_test.go, cmd/api/src/services/storage/storage_test.go, cmd/api/src/services/storage/mocks/fs.go
Introduces the new storage package with public interfaces (Storage, FileService, FileServiceResolver), shared data types (FileInfo, WriteOptions, ListOptions), a complete LocalStore implementation using os.Root for path confinement and atomic file writes, and comprehensive gomock-based mocks. Test coverage includes operations across Put/Get/Delete/List/Copy/Move with context cancellation, path safety validation, and edge cases.
Bootstrap initialization and runtime dependencies
cmd/api/src/bootstrap/initializer.go, cmd/api/src/bootstrap/util.go, cmd/api/src/cmd/bhapi/main.go, cmd/api/src/services/entrypoint.go, cmd/api/src/services/entrypoint_test.go, cmd/api/src/bootstrap/server_test.go
Introduces RuntimeDependencies type containing FileServiceResolver, RuntimeDependenciesConstructor callback, and CreateRuntimeDependencies factory that initializes default file services and validates retained file service availability. Updates InitializerLogic signature to accept and thread runtime dependencies through daemon startup. Wires into bootstrap initializer and test fixture.
Upload and ingest file handling via storage
cmd/api/src/services/upload/upload.go, cmd/api/src/services/upload/upload_test.go, cmd/api/src/api/v2/fileingest.go, cmd/api/src/api/v2/fileingest_test.go
Refactors SaveIngestFile to accept context.Context and storage.FileService instead of filesystem location. Introduces WriteAndValidateFile for concurrent streaming validation using io.Pipe and goroutines, with automatic cleanup on failures. Updates V2 fileingest to resolve ingest FileService via resolver and replaces os.Remove with fileService.DeleteFile.
Retained ingest file handling via storage
cmd/api/src/api/tools/ingest.go, cmd/api/src/api/tools/ingest_test.go, cmd/api/src/daemons/api/toolapi/api.go
Refactors IngestControl to depend on FileService instead of filesystem configuration. Adds retainedArchiveName for tar entry sanitization and writeRetainedFileToTar for streaming files into tar archives. Updates FetchRetainedIngestFiles to list and stream files via storage and DisableIngestFileRetention to delete asynchronously via storage service with proper lock management. Tests verify path sanitization, tar writing, directory skipping, and lock conflict behavior.
Graphify ingest storage extraction and processing
cmd/api/src/services/graphify/ingest_storage.go, cmd/api/src/services/graphify/ingest_storage_test.go, cmd/api/src/services/graphify/service.go, cmd/api/src/services/graphify/tasks.go, cmd/api/src/services/graphify/tasks_integration_test.go, cmd/api/src/services/graphify/graphify_integration_test.go
Introduces storage-backed ingest helpers: SpoolToScratch (copy stored file to scratch), OpenScratchReadSeeker (open spooled file for reading), WriteArchiveFileToStorage (extract ZIP entry to storage with UTF-8 normalization), and ExtractIngestFiles (orchestrate JSON/ZIP extraction with per-entry error collection). Refactors ProcessIngestFile and processSingleFile to use scratch workflow. Updates GraphifyService to store fileServiceResolver and wire it through constructors. Includes tests covering spooling, archive extraction, error handling, and integration scenarios.
Orphan file cleanup refactoring with storage support
cmd/api/src/daemons/datapipe/cleanup.go, cmd/api/src/daemons/datapipe/cleanup_internal_test.go, cmd/api/src/daemons/datapipe/cleanup_test.go, cmd/api/src/daemons/datapipe/pipeline.go, cmd/api/src/daemons/datapipe/pipeline_integration_test.go, cmd/api/src/daemons/datapipe/datapipe_integration_test.go, cmd/api/src/daemons/datapipe/delete_integration_test.go
Extends OrphanFileSweeper to accept scratch root path and excluded storage prefixes. Implements ClearLocalIngestScratch for time-based scratch cleanup, clearStoredIngestFiles for storage-backed orphan deletion with prefix exclusion, and clearLegacyLocalIngestFiles for legacy local temp cleanup. Updates Clear to accept FileService and execute multi-stage cleanup flow. Adds path normalization and validation helpers. Updates pipeline to resolve ingest file service and pass into cleanup. Comprehensive test coverage includes normalization, exclusion logic, stored/legacy/scratch cleanup, and context cancellation.
API and daemon component wiring
cmd/api/src/api/v2/model.go, cmd/api/src/api/registration/registration.go, cmd/api/src/api/v2/collectors.go, cmd/api/src/api/v2/collectors_test.go, cmd/api/src/test/lab/fixtures/api.go
Replaces FileService fs.Service with FileServiceResolver storage.FileServiceResolver in V2 Resources. Updates NewResources to accept and wire resolver. Updates collectors handlers to resolve FileServiceCollectors before reading files. Updates registration to pass resolver into v2.NewResources. Wires resolver through test fixtures and test mocks. Removes tests for deprecated fs service mocks and replaces with storage resolver mocks.
Configuration and legacy service removal
cmd/api/src/config/config.go, cmd/api/src/services/fs/fs.go, cmd/api/src/services/fs/mocks/fs.go, cmd/api/src/services/upload/streamdecoder.go, packages/go/schemagen/generator/sql.go, go.mod
Adds ScratchDirectory() method to Configuration returning ingest_scratch path. Removes deprecated fs.Service interface and Client implementation along with all mocks (87 lines). Updates error message construction in streamdecoder.go and SQL generation in schemagen to use fmt.Fprintf instead of fmt.Sprintf+WriteString. Updates AWS SDK indirect dependencies to newer versions in go.mod.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mcuomo/BED-8307

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
cmd/api/src/daemons/datapipe/pipeline_integration_test.go (1)

51-54: ⚡ Quick win

Hoist the new ingest storage variables into each function’s top var (...) block.

The new ingestLocalStore, err, and ingestFileService initializations are currently split from the existing declaration block; this file now mixes both styles repeatedly.

Suggested pattern
 func TestDeleteData_Sourceless(t *testing.T) {
 	var (
 		ctx = context.Background()
@@
 		files = []string{
 			"base.json",
 		}
+		ingestLocalStore *storage.LocalStore
+		ingestFileService storage.FileService
+		err error
 	)
-	ingestLocalStore, err := storage.NewLocalStore(testSuite.WorkDir)
+	ingestLocalStore, err = storage.NewLocalStore(testSuite.WorkDir)
 	require.NoError(t, err, "error creating ingest local store")
 	ingestFileService := storage.NewFileService(ingestLocalStore)

As per coding guidelines **/*.go: "Group variable initializations in a var ( ... ) block and hoist them to the top of the function when possible".

Also applies to: 103-106, 155-158, 206-209

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/daemons/datapipe/pipeline_integration_test.go` around lines 51 -
54, Hoist the newly added variables ingestLocalStore, err, and ingestFileService
into each function's top var (...) block instead of creating them inline; locate
the functions where these appear (references to ingestLocalStore, err,
ingestFileService in pipeline_integration_test.go around the shown snippets and
at the other spots noted) and move their declarations into the existing var
block at the start of the function, then assign them in-place (e.g.,
ingestLocalStore, err = storage.NewLocalStore(...); ingestFileService =
storage.NewFileService(ingestLocalStore)); ensure you do not shadow the
function-level err and keep existing require.NoError calls unchanged.
cmd/api/src/daemons/datapipe/delete_integration_test.go (1)

49-52: ⚡ Quick win

Use the top var (...) block for the new ingest storage declarations.

Each updated test now introduces post-block := initializations for ingest storage state; please keep these grouped with existing declarations at function start for consistency.

As per coding guidelines **/*.go: "Group variable initializations in a var ( ... ) block and hoist them to the top of the function when possible".

Also applies to: 100-103, 148-151, 195-198

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/daemons/datapipe/delete_integration_test.go` around lines 49 -
52, The new ingest storage initializations (ingestLocalStore, ingestFileService
created via storage.NewLocalStore(testSuite.WorkDir) and storage.NewFileService)
must be moved into the top var (...) block of the test function instead of using
post-block := statements; declare them in the existing var block (e.g., var
ingestLocalStore storage.LocalStoreType, ingestFileService
storage.FileServiceType or with their concrete types) and then assign to those
vars (ingestLocalStore, ingestFileService) using = with the results of
storage.NewLocalStore(testSuite.WorkDir) and storage.NewFileService(...) at the
start of the function so all variable declarations are grouped consistently
(apply same change for the other occurrences mentioned around lines 100-103,
148-151, 195-198).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/api/src/api/tools/ingest.go`:
- Around line 72-76: The retainedArchiveName function can still return "." or
".." which are unsafe; modify retainedArchiveName to sanitize further by
computing base := path.Base(filePath) after cleaning/trimPrefix and if base is
"." or base is ".." (or filePath starts with "../") return an empty string (or
otherwise signal the entry should be skipped) instead of returning
path.Base(filePath) directly; update callers that rely on retainedArchiveName to
skip entries when it returns an empty string so tar entries with "."/".." are
dropped.

In `@cmd/api/src/bootstrap/initializer.go`:
- Around line 70-72: s.RuntimeDependencies is called unguarded in Launch (or
initializer) which will panic if the field is nil; add a nil-check before
invoking s.RuntimeDependencies and return a clear error if it is nil (e.g.,
"missing RuntimeDependencies function") instead of calling it. Specifically,
before the existing call, verify that s.RuntimeDependencies != nil and only then
call s.RuntimeDependencies(ctx, s.Configuration, databaseConnections); if nil,
return an error indicating the runtime-dependencies hook is not provided so
callers get a proper error rather than a panic.

In `@cmd/api/src/daemons/datapipe/cleanup.go`:
- Around line 98-99: The normalization of tempDirectoryRootPath and
scratchDirectoryRootPath using strings.TrimSuffix(...,
string(filepath.Separator)) can turn "/" into "" and cause cleanup to operate on
an empty base; update the normalization in cleanup.go to first detect if the
value is exactly string(filepath.Separator) (or os.PathSeparator as string) and
leave it unchanged, otherwise trim the trailing separator—apply this logic to
both tempDirectoryRootPath and scratchDirectoryRootPath (the variables
referenced in the diff) so root "/" is preserved and only non-root paths have
their trailing separator removed.
- Around line 232-239: The code currently computes logicalPath with
path.Clean(file.Path) but leaves a leading slash so it won't match keys in
expectedFiles; update the normalization in the cleanup routine (where
logicalPath is set and checked against isExcludedStoragePath and the
expectedFiles map) to remove any leading slash (e.g., trim a leading "/" after
path.Clean) so comparisons use the same canonical form as keys stored in
expectedFiles.

In `@cmd/api/src/services/entrypoint.go`:
- Around line 98-100: PreMigrationDaemons and Entrypoint currently dereference
deps.FileServiceResolver without checking for nil; add an explicit nil guard
(check deps.FileServiceResolver == nil) at the start of both functions
(PreMigrationDaemons and Entrypoint) and return a clear formatted error (e.g.,
fmt.Errorf("missing FileServiceResolver in RuntimeDependencies")) so the
functions fail gracefully instead of panicking when RuntimeDependencies is
incomplete.

---

Nitpick comments:
In `@cmd/api/src/daemons/datapipe/delete_integration_test.go`:
- Around line 49-52: The new ingest storage initializations (ingestLocalStore,
ingestFileService created via storage.NewLocalStore(testSuite.WorkDir) and
storage.NewFileService) must be moved into the top var (...) block of the test
function instead of using post-block := statements; declare them in the existing
var block (e.g., var ingestLocalStore storage.LocalStoreType, ingestFileService
storage.FileServiceType or with their concrete types) and then assign to those
vars (ingestLocalStore, ingestFileService) using = with the results of
storage.NewLocalStore(testSuite.WorkDir) and storage.NewFileService(...) at the
start of the function so all variable declarations are grouped consistently
(apply same change for the other occurrences mentioned around lines 100-103,
148-151, 195-198).

In `@cmd/api/src/daemons/datapipe/pipeline_integration_test.go`:
- Around line 51-54: Hoist the newly added variables ingestLocalStore, err, and
ingestFileService into each function's top var (...) block instead of creating
them inline; locate the functions where these appear (references to
ingestLocalStore, err, ingestFileService in pipeline_integration_test.go around
the shown snippets and at the other spots noted) and move their declarations
into the existing var block at the start of the function, then assign them
in-place (e.g., ingestLocalStore, err = storage.NewLocalStore(...);
ingestFileService = storage.NewFileService(ingestLocalStore)); ensure you do not
shadow the function-level err and keep existing require.NoError calls unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: ed5db286-a133-42cc-a696-08847ed6f9c0

📥 Commits

Reviewing files that changed from the base of the PR and between bf183d7 and cd0604e.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (42)
  • cmd/api/src/api/registration/registration.go
  • cmd/api/src/api/tools/ingest.go
  • cmd/api/src/api/tools/ingest_test.go
  • cmd/api/src/api/v2/collectors.go
  • cmd/api/src/api/v2/collectors_test.go
  • cmd/api/src/api/v2/fileingest.go
  • cmd/api/src/api/v2/fileingest_test.go
  • cmd/api/src/api/v2/model.go
  • cmd/api/src/bootstrap/initializer.go
  • cmd/api/src/bootstrap/server_test.go
  • cmd/api/src/bootstrap/util.go
  • cmd/api/src/cmd/bhapi/main.go
  • cmd/api/src/config/config.go
  • cmd/api/src/daemons/api/toolapi/api.go
  • cmd/api/src/daemons/datapipe/cleanup.go
  • cmd/api/src/daemons/datapipe/cleanup_internal_test.go
  • cmd/api/src/daemons/datapipe/cleanup_test.go
  • cmd/api/src/daemons/datapipe/datapipe_integration_test.go
  • cmd/api/src/daemons/datapipe/delete_integration_test.go
  • cmd/api/src/daemons/datapipe/pipeline.go
  • cmd/api/src/daemons/datapipe/pipeline_integration_test.go
  • cmd/api/src/services/entrypoint.go
  • cmd/api/src/services/entrypoint_test.go
  • cmd/api/src/services/fs/fs.go
  • cmd/api/src/services/fs/mocks/fs.go
  • cmd/api/src/services/graphify/graphify_integration_test.go
  • cmd/api/src/services/graphify/ingest_storage.go
  • cmd/api/src/services/graphify/ingest_storage_test.go
  • cmd/api/src/services/graphify/service.go
  • cmd/api/src/services/graphify/tasks.go
  • cmd/api/src/services/graphify/tasks_integration_test.go
  • cmd/api/src/services/storage/localstore.go
  • cmd/api/src/services/storage/localstore_test.go
  • cmd/api/src/services/storage/mocks/fs.go
  • cmd/api/src/services/storage/storage.go
  • cmd/api/src/services/storage/storage_test.go
  • cmd/api/src/services/upload/streamdecoder.go
  • cmd/api/src/services/upload/upload.go
  • cmd/api/src/services/upload/upload_test.go
  • cmd/api/src/test/lab/fixtures/api.go
  • go.mod
  • packages/go/schemagen/generator/sql.go
💤 Files with no reviewable changes (2)
  • cmd/api/src/services/fs/mocks/fs.go
  • cmd/api/src/services/fs/fs.go

Comment on lines +72 to +76
func retainedArchiveName(filePath string) string {
filePath = path.Clean(strings.TrimPrefix(filePath, "/"))
if filePath == "." || filePath == ".." || strings.HasPrefix(filePath, "../") {
return path.Base(filePath)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Go's path package, what are the return values of path.Base(".") and path.Base(".."), and are "."/".." safe as tar header names?

💡 Result:

In Go's path package (and the related path/filepath package), the function path.Base returns the last element of a path [1][2][3]. For the inputs "." and "..", the return values are as follows: - path.Base(".") returns "." [3] - path.Base("..") returns ".." [3] Regarding their safety as tar header names, "." and ".." are generally considered unsafe or problematic when used as file paths within tar archives [4]. Including these as path components in a tar header can lead to directory traversal vulnerabilities during extraction [4]. If an archive contains entries with ".." in their paths, a naive extraction process might write files outside of the intended target directory [4]. Many modern tar implementations, including GNU tar, have built-in protections that may refuse to extract entries containing ".." or attempt to sanitize them to prevent such security risks [4]. Consequently, it is best practice to avoid using "." or ".." in tar header names to ensure both security and portability [5][4].

Citations:


🏁 Script executed:

cat -n cmd/api/src/api/tools/ingest.go | sed -n '70,85p'

Repository: SpecterOps/BloodHound

Length of output: 704


🏁 Script executed:

# Check if there are other path sanitization or tar safety measures in the file
rg -n "writeRetainedFileToTar|retainedArchiveName" cmd/api/src/api/tools/ingest.go -A 5 -B 2

Repository: SpecterOps/BloodHound

Length of output: 1102


Harden tar entry sanitization for "." / ".." names.

retainedArchiveName can still emit "." or ".." entries for tar archives, which are unsafe and can lead to directory traversal vulnerabilities during extraction. The current logic on line 74–75 attempts to handle these cases but path.Base(".") returns "." and path.Base("..") returns "..", leaving the vulnerability unpatched.

Proposed fix
 func retainedArchiveName(filePath string) string {
 	filePath = path.Clean(strings.TrimPrefix(filePath, "/"))
-	if filePath == "." || filePath == ".." || strings.HasPrefix(filePath, "../") {
-		return path.Base(filePath)
+	if filePath == "." || filePath == ".." || strings.HasPrefix(filePath, "../") {
+		baseName := path.Base(filePath)
+		if baseName == "." || baseName == ".." || baseName == "/" || baseName == "" {
+			return "retained-file"
+		}
+		return baseName
 	}
 	return filePath
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func retainedArchiveName(filePath string) string {
filePath = path.Clean(strings.TrimPrefix(filePath, "/"))
if filePath == "." || filePath == ".." || strings.HasPrefix(filePath, "../") {
return path.Base(filePath)
}
func retainedArchiveName(filePath string) string {
filePath = path.Clean(strings.TrimPrefix(filePath, "/"))
if filePath == "." || filePath == ".." || strings.HasPrefix(filePath, "../") {
baseName := path.Base(filePath)
if baseName == "." || baseName == ".." || baseName == "/" || baseName == "" {
return "retained-file"
}
return baseName
}
return filePath
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/api/tools/ingest.go` around lines 72 - 76, The
retainedArchiveName function can still return "." or ".." which are unsafe;
modify retainedArchiveName to sanitize further by computing base :=
path.Base(filePath) after cleaning/trimPrefix and if base is "." or base is ".."
(or filePath starts with "../") return an empty string (or otherwise signal the
entry should be skipped) instead of returning path.Base(filePath) directly;
update callers that rely on retainedArchiveName to skip entries when it returns
an empty string so tar entries with "."/".." are dropped.

Comment on lines +70 to +72
if dependencies, err := s.RuntimeDependencies(ctx, s.Configuration, databaseConnections); err != nil {
return fmt.Errorf("failed to initialize runtime dependencies: %w", err)
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard RuntimeDependencies before invocation to avoid nil-function panic.

Line 70 calls s.RuntimeDependencies(...) unconditionally. If a caller omits this field, Launch panics instead of returning an error.

Suggested fix
-	if dependencies, err := s.RuntimeDependencies(ctx, s.Configuration, databaseConnections); err != nil {
+	dependencies := RuntimeDependencies{}
+	if s.RuntimeDependencies != nil {
+		if dependencies, err = s.RuntimeDependencies(ctx, s.Configuration, databaseConnections); err != nil {
+			return fmt.Errorf("failed to initialize runtime dependencies: %w", err)
+		}
+	}
-		return fmt.Errorf("failed to initialize runtime dependencies: %w", err)
-	} else {
+	{
 		// Daemons that start prior to blocking db migration
 		if s.PreMigrationDaemons != nil {
 			if daemonInstances, err := s.PreMigrationDaemons(ctx, s.Configuration, databaseConnections, dependencies); err != nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/bootstrap/initializer.go` around lines 70 - 72,
s.RuntimeDependencies is called unguarded in Launch (or initializer) which will
panic if the field is nil; add a nil-check before invoking s.RuntimeDependencies
and return a clear error if it is nil (e.g., "missing RuntimeDependencies
function") instead of calling it. Specifically, before the existing call, verify
that s.RuntimeDependencies != nil and only then call s.RuntimeDependencies(ctx,
s.Configuration, databaseConnections); if nil, return an error indicating the
runtime-dependencies hook is not provided so callers get a proper error rather
than a panic.

Comment on lines +98 to +99
tempDirectoryRootPath: strings.TrimSuffix(tempDirectoryRootPath, string(filepath.Separator)),
scratchDirectoryRootPath: strings.TrimSuffix(scratchDirectoryRootPath, string(filepath.Separator)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve root paths when normalizing configured directories.

strings.TrimSuffix(..., string(filepath.Separator)) turns "/" into "". If either root path is configured as /, cleanup can run against an empty base path and target unintended locations.

Suggested fix
 func NewOrphanFileSweeper(fileOps FileOperations, tempDirectoryRootPath string, scratchDirectoryRootPath string, excludedStoragePrefixes ...string) *OrphanFileSweeper {
+    normalizedTempRoot := filepath.Clean(tempDirectoryRootPath)
+    normalizedScratchRoot := filepath.Clean(scratchDirectoryRootPath)
+
 	return &OrphanFileSweeper{
 		lock:                     &sync.Mutex{},
 		fileOps:                  fileOps,
-		tempDirectoryRootPath:    strings.TrimSuffix(tempDirectoryRootPath, string(filepath.Separator)),
-		scratchDirectoryRootPath: strings.TrimSuffix(scratchDirectoryRootPath, string(filepath.Separator)),
+		tempDirectoryRootPath:    normalizedTempRoot,
+		scratchDirectoryRootPath: normalizedScratchRoot,
 		excludedStoragePrefixes:  normalizeStoragePrefixes(excludedStoragePrefixes),
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tempDirectoryRootPath: strings.TrimSuffix(tempDirectoryRootPath, string(filepath.Separator)),
scratchDirectoryRootPath: strings.TrimSuffix(scratchDirectoryRootPath, string(filepath.Separator)),
func NewOrphanFileSweeper(fileOps FileOperations, tempDirectoryRootPath string, scratchDirectoryRootPath string, excludedStoragePrefixes ...string) *OrphanFileSweeper {
normalizedTempRoot := filepath.Clean(tempDirectoryRootPath)
normalizedScratchRoot := filepath.Clean(scratchDirectoryRootPath)
return &OrphanFileSweeper{
lock: &sync.Mutex{},
fileOps: fileOps,
tempDirectoryRootPath: normalizedTempRoot,
scratchDirectoryRootPath: normalizedScratchRoot,
excludedStoragePrefixes: normalizeStoragePrefixes(excludedStoragePrefixes),
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/daemons/datapipe/cleanup.go` around lines 98 - 99, The
normalization of tempDirectoryRootPath and scratchDirectoryRootPath using
strings.TrimSuffix(..., string(filepath.Separator)) can turn "/" into "" and
cause cleanup to operate on an empty base; update the normalization in
cleanup.go to first detect if the value is exactly string(filepath.Separator)
(or os.PathSeparator as string) and leave it unchanged, otherwise trim the
trailing separator—apply this logic to both tempDirectoryRootPath and
scratchDirectoryRootPath (the variables referenced in the diff) so root "/" is
preserved and only non-root paths have their trailing separator removed.

Comment on lines +232 to +239
logicalPath := path.Clean(file.Path)
if s.isExcludedStoragePath(logicalPath) {
continue
}

if _, ok := expectedFiles[logicalPath]; ok {
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize listed storage paths before expected-file lookup.

path.Clean(file.Path) keeps a leading slash, but expected paths are stored without one. A listed file like "/expected.json" won’t match "expected.json" and can be deleted as orphan.

Suggested fix
-		logicalPath := path.Clean(file.Path)
+		logicalPath := path.Clean(strings.TrimPrefix(filepath.ToSlash(file.Path), "/"))
 		if s.isExcludedStoragePath(logicalPath) {
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/daemons/datapipe/cleanup.go` around lines 232 - 239, The code
currently computes logicalPath with path.Clean(file.Path) but leaves a leading
slash so it won't match keys in expectedFiles; update the normalization in the
cleanup routine (where logicalPath is set and checked against
isExcludedStoragePath and the expectedFiles map) to remove any leading slash
(e.g., trim a leading "/" after path.Clean) so comparisons use the same
canonical form as keys stored in expectedFiles.

Comment on lines +98 to +100
func PreMigrationDaemons(ctx context.Context, cfg config.Configuration, connections bootstrap.DatabaseConnections[*database.BloodhoundDB, *graph.DatabaseSwitch], deps bootstrap.RuntimeDependencies) ([]daemons.Daemon, error) {
if retainedFileService, err := deps.FileServiceResolver.Resolve(storage.FileServiceRetained); err != nil {
return nil, fmt.Errorf("error resolving FileServiceRetained: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, find the file to confirm its location
find . -name "entrypoint.go" -path "*/cmd/api/src/services/*" 2>/dev/null

# Get file size to determine reading strategy
wc -l "cmd/api/src/services/entrypoint.go" 2>/dev/null || echo "File not found in expected location"

Repository: SpecterOps/BloodHound

Length of output: 141


🏁 Script executed:

# List structure to understand repository layout
git ls-files cmd/api/src/services/ | head -20

Repository: SpecterOps/BloodHound

Length of output: 976


🏁 Script executed:

# Read the entire entrypoint.go file to understand the context
cat -n "cmd/api/src/services/entrypoint.go"

Repository: SpecterOps/BloodHound

Length of output: 11913


🏁 Script executed:

# Also check the RuntimeDependencies definition to understand FileServiceResolver
rg "type RuntimeDependencies" -A 20 --type go

Repository: SpecterOps/BloodHound

Length of output: 3267


🏁 Script executed:

# Check the FileServiceResolver type definition
rg "type FileServiceResolver" -A 5 --type go

Repository: SpecterOps/BloodHound

Length of output: 525


🏁 Script executed:

# Check how CreateRuntimeDependencies is called and if there are cases where deps could have nil resolver
rg "CreateRuntimeDependencies\|PreMigrationDaemons\|Entrypoint" --type go -B 2 -A 2 | grep -E "(CreateRuntimeDependencies|PreMigrationDaemons|Entrypoint|RuntimeDependencies)" | head -40

Repository: SpecterOps/BloodHound

Length of output: 47


🏁 Script executed:

# Check bootstrap initializer to see how these functions are wired together
cat -n "cmd/api/src/bootstrap/initializer.go" | head -100

Repository: SpecterOps/BloodHound

Length of output: 4696


🏁 Script executed:

# Check if PreMigrationDaemons or Entrypoint are called elsewhere besides the Initializer
rg "PreMigrationDaemons\|Entrypoint" cmd/api/src/services/entrypoint.go -A 1 -B 1

# Verify where these functions might be called from
rg "PreMigrationDaemons|Entrypoint\(" --type go -B 2 | grep -E "\.go-|PreMigrationDaemons|Entrypoint" | head -30

Repository: SpecterOps/BloodHound

Length of output: 2958


🏁 Script executed:

# Check the test file to see how PreMigrationDaemons is being called
cat -n "cmd/api/src/services/entrypoint_test.go"

Repository: SpecterOps/BloodHound

Length of output: 4770


🏁 Script executed:

# Check if the functions are exported (public) and if there are other callers
rg "func (PreMigrationDaemons|Entrypoint)" cmd/api/src/services/entrypoint.go

# Verify the functions are exported by checking if they start with uppercase
grep -n "^func PreMigrationDaemons\|^func Entrypoint" cmd/api/src/services/entrypoint.go

Repository: SpecterOps/BloodHound

Length of output: 958


Add nil guard for FileServiceResolver before use.

Both PreMigrationDaemons and Entrypoint are public functions that access deps.FileServiceResolver without checking for nil. If these functions are called with an incomplete RuntimeDependencies struct (or if initialization fails elsewhere), a nil dereference will panic. Add explicit nil checks to fail gracefully with a clear error message.

Suggested patch
 func PreMigrationDaemons(ctx context.Context, cfg config.Configuration, connections bootstrap.DatabaseConnections[*database.BloodhoundDB, *graph.DatabaseSwitch], deps bootstrap.RuntimeDependencies) ([]daemons.Daemon, error) {
+	if deps.FileServiceResolver == nil {
+		return nil, fmt.Errorf("file service resolver is required")
+	}
+
 	if retainedFileService, err := deps.FileServiceResolver.Resolve(storage.FileServiceRetained); err != nil {
 		return nil, fmt.Errorf("error resolving FileServiceRetained: %w", err)
 	} else {
@@
 func Entrypoint(ctx context.Context, cfg config.Configuration, connections bootstrap.DatabaseConnections[*database.BloodhoundDB, *graph.DatabaseSwitch], deps bootstrap.RuntimeDependencies) ([]daemons.Daemon, error) {
+	if deps.FileServiceResolver == nil {
+		return nil, fmt.Errorf("file service resolver is required")
+	}
+
 	dogtagsService := dogtags.NewDefaultService()

Also applies to: 108-109

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/api/src/services/entrypoint.go` around lines 98 - 100,
PreMigrationDaemons and Entrypoint currently dereference
deps.FileServiceResolver without checking for nil; add an explicit nil guard
(check deps.FileServiceResolver == nil) at the start of both functions
(PreMigrationDaemons and Entrypoint) and return a clear formatted error (e.g.,
fmt.Errorf("missing FileServiceResolver in RuntimeDependencies")) so the
functions fail gracefully instead of panicking when RuntimeDependencies is
incomplete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant