Add --domain option to analyze.sh for domain-specific analysis

JohT · JohT · commit 540a8bd515d6 · 2026-03-31T07:56:35.000+02:00
diff --git a/.github/prompts/plan-addDomainOption.prompt.md b/.github/prompts/plan-addDomainOption.prompt.md
@@ -0,0 +1,72 @@
+# Plan: Add `--domain` option to analyze.sh
+
+Add an optional `--domain <name>` CLI option to `analyze.sh` that selects a single domain (subdirectory of `domains/`) for vertical-slice analysis. When set, only that domain's report scripts run; core reports from `scripts/reports/` and other domains are skipped. Composes naturally with `--report` (horizontal slice). When omitted, behavior is unchanged.
+
+---
+
+**Steps**
+
+### Phase 1: `analyze.sh` CLI parsing and validation
+
+1. **Add `--domain` to argument parsing** in [analyze.sh](scripts/analysis/analyze.sh) — add default `analysisDomain=""`, add `--domain)` case in the `while` loop, update `usage()`
+2. **Validate the domain name** — POSIX `case` glob pattern `*[!A-Za-z0-9-]*` to reject invalid characters (only if non-empty), resolve `DOMAINS_DIR="${SCRIPTS_DIR}/../domains"`, check `domains/<name>/` subdirectory exists with clear error message, then set `ANALYSIS_DOMAIN` (plain variable, no `export`)
+3. **Log the domain** in the "Start Analysis" group alongside `analysisReportCompilation`, `settingsProfile`, `exploreMode`
+
+### Phase 2: Report compilation scripts — respect `ANALYSIS_DOMAIN` (*all steps parallel*)
+
+4. **Modify [CsvReports.sh](scripts/reports/compilations/CsvReports.sh)** — when `ANALYSIS_DOMAIN` is set, replace `for directory in "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}"` with just `"${DOMAINS_DIRECTORY}/${ANALYSIS_DOMAIN}"`
+5. **Modify [PythonReports.sh](scripts/reports/compilations/PythonReports.sh)** — same pattern (Python env activation still runs)
+6. **Modify [VisualizationReports.sh](scripts/reports/compilations/VisualizationReports.sh)** — same pattern
+7. **Modify [MarkdownReports.sh](scripts/reports/compilations/MarkdownReports.sh)** — same pattern
+8. **Modify [JupyterReports.sh](scripts/reports/compilations/JupyterReports.sh)** — add early return with log message when `ANALYSIS_DOMAIN` is set (domains don't include Jupyter notebooks in the compilation path)
+9. **No changes to `AllReports.sh`** (chains the above scripts, filtering cascades) or **`DatabaseCsvExportReports.sh`** (special case, invoked explicitly only)
+
+### Phase 3: GitHub Actions workflow (*depends on Phase 1*)
+
+10. **Add `domain` input** to [public-analyze-code-graph.yml](.github/workflows/public-analyze-code-graph.yml) — optional string, default `''`. In the "Analyze" step, prepend `--domain <value>` to `analysis-arguments` when non-empty
+
+### Phase 4: Documentation (*depends on Phase 1*)
+
+11. **Update [analyze.sh](scripts/analysis/analyze.sh) header comments** — add `# Note:` block for `--domain` matching existing style
+12. **Update [COMMANDS.md](COMMANDS.md)** — add `--domain` under "Command Line Options" and document the `ANALYSIS_DOMAIN` environment variable alongside other overrideable variables
+13. **Update [GETTING_STARTED.md](GETTING_STARTED.md)** — add example: `./../../scripts/analysis/analyze.sh --domain anomaly-detection`
+
+### Phase 5: Test scripts (*depends on Phases 1–2*)
+
+14. **Create [testAnalyzeDomainOption.sh](scripts/testAnalyzeDomainOption.sh)** — follow existing conventions (`testCloneGitRepository.sh` pattern: `tearDown`, `successful`, `fail`, `info` helpers; temp directory with fake `domains/` structure; auto-discovered by `runTests.sh` via `find … -name 'test*.sh'`). Test cases:
+    - Reject `--domain` with invalid characters (e.g. `../../etc`) → fails at regex
+    - Reject `--domain` with nonexistent domain name → fails with error listing available domains
+    - Accept `--domain` with valid name matching a temp subdirectory → passes validation (script then fails at "no artifacts" check, which confirms domain validation succeeded)
+    - No `--domain` given → passes validation unchanged (same late failure)
+
+---
+
+**Relevant files**
+- `scripts/analysis/analyze.sh` — add `--domain` parsing, validation (match pattern of `settingsProfile`), set `ANALYSIS_DOMAIN` (no `export`)
+- `scripts/reports/compilations/CsvReports.sh` — conditionally filter `for directory in ...` loop
+- `scripts/reports/compilations/PythonReports.sh` — same conditional filtering
+- `scripts/reports/compilations/VisualizationReports.sh` — same conditional filtering
+- `scripts/reports/compilations/MarkdownReports.sh` — same conditional filtering
+- `scripts/reports/compilations/JupyterReports.sh` — early exit when `ANALYSIS_DOMAIN` is set
+- `.github/workflows/public-analyze-code-graph.yml` — add `domain` input, pass through
+- `COMMANDS.md` — document `--domain` option and `ANALYSIS_DOMAIN` environment variable
+- `GETTING_STARTED.md` — add usage examples
+- `scripts/testAnalyzeDomainOption.sh` — new test script for `--domain` validation (auto-discovered by `runTests.sh`)
+
+**Verification**
+1. Run `analyze.sh --domain nonexistent` → clear error listing available domains
+2. Run `--domain anomaly-detection --report Csv` → only `anomalyDetectionCsv.sh` runs (no core CSV, no `externalDependenciesCsv.sh`)
+3. Run `--domain anomaly-detection` (default `--report All`) → only anomaly-detection scripts for Csv/Python/Visualization/Markdown; Jupyter skipped
+4. Run without `--domain` → all reports + all domains execute unchanged (backward compat)
+5. Run `--domain "../../etc"` → regex rejects it
+6. Run example script with `--domain anomaly-detection` → argument passes through via `"${@}"`
+
+**Decisions**
+- `--domain` and `--report` compose: report selects type (horizontal), domain selects scope (vertical)
+- When `--domain` is set, core reports from `scripts/reports/` are **skipped** — only the domain's scripts run
+- JupyterReports.sh skipped when a domain is selected (no domain-scoped notebooks)
+- Only a single domain selectable (not comma-separated)
+- Propagated via `ANALYSIS_DOMAIN` shell variable (no `export`) from `analyze.sh` to compilation scripts — an env var (not script arguments) because compilation scripts are `source`d (not subprocesses), positional params would conflict in nested sourcing, and it follows the established convention (`DOMAINS_DIRECTORY`, `REPORTS_SCRIPT_DIR`, etc.)
+- **Not exported** — `export` would leak the variable into all child processes (Python, Java/jQAssistant, Neo4j, npm/node) where it could collide with unrelated programs outside this project's control. Since all compilation scripts are `source`d (same shell), `export` is unnecessary
+- **POSIX-compliant where practical** — prefer `case` glob patterns over `[[ =~ ]]` for validation (e.g. `case "${var}" in *[!A-Za-z0-9-]*) …`), `[ ]` over `[[ ]]` for simple tests, standard parameter expansion, and portable constructs. No new external dependencies. Must run on macOS, Linux, and Windows (Git Bash, WSL). Exception: `${BASH_SOURCE[0]}` (already used throughout the codebase). Follow existing script conventions over strict POSIX when they conflict
+- **Readability over brevity** — no abbreviations in variable names, function names, or messages, even if names feel long (e.g. selectedAnalysisDomain over domain, analysisDomainsDirectory over domainsDir). Follow the existing codebase style (analysisReportCompilation, settingsProfile, REPORT_COMPILATIONS_SCRIPT_DIR, etc.)
diff --git a/.github/workflows/internal-typescript-upload-code-example.yml b/.github/workflows/internal-typescript-upload-code-example.yml
@@ -121,4 +121,5 @@ jobs:
       analysis-name: ${{ needs.prepare-code-to-analyze.outputs.analysis-name }}
       sources-upload-name: ${{ needs.prepare-code-to-analyze.outputs.sources-upload-name }}
       jupyter-pdf: "false"
-      analysis-arguments: "--explore" # Only setup the Graph, do not generate any reports
+      domain: "external-dependencies" # For testing purposes: only run the external-dependencies domain (vertical slice)
+      analysis-arguments: "--report Csv" # For testing purposes: only generate CSV reports
diff --git a/.github/workflows/public-analyze-code-graph.yml b/.github/workflows/public-analyze-code-graph.yml
@@ -73,6 +73,18 @@ on:
         required: false
         type: string
         default: '--profile Neo4j-latest-low-memory'
+      domain:
+        description: >
+          The name of an analysis domain to run.
+          Must match a subdirectory name in the 'domains/' directory
+          (e.g. 'anomaly-detection', 'external-dependencies').
+          When set, only that domain's report scripts run;
+          core reports from 'scripts/reports/' and other domains are skipped.
+          Can be combined with 'analysis-arguments' to further narrow the reports.
+          Default: '' (all domains and reports run unchanged)
+        required: false
+        type: string
+        default: ''
       typescript-scan-heap-memory:
         description: >
           The heap memory size in MB to use for the TypeScript code scans (default=4096).
@@ -252,7 +264,8 @@ jobs:
         working-directory: temp
         run: |
           ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'
-
+      - name: Assemble DOMAIN_ARGUMENT
+        run: echo "domainAnalysisArgument=${{ inputs.domain != '' && format('--domain {0} ', inputs.domain) || '' }}" >> $GITHUB_ENV
       - name: (Code Analysis) Analyze ${{ inputs.analysis-name }}
         working-directory: temp/${{ inputs.analysis-name }}
         # Shell type can be skipped if jupyter notebook analysis-results (and therefore conda) aren't needed
@@ -264,7 +277,7 @@ jobs:
           PREPARE_CONDA_ENVIRONMENT: "false" # Had already been done in step with id "prepare-conda-environment".
           USE_VIRTUAL_PYTHON_ENVIRONMENT_VENV: ${{ inputs.use-venv_virtual_python_environment }}
         run: |
-          TYPESCRIPT_SCAN_HEAP_MEMORY=${{ inputs.typescript-scan-heap-memory }} ./../../scripts/analysis/analyze.sh ${{ inputs.analysis-arguments }}
+          TYPESCRIPT_SCAN_HEAP_MEMORY=${{ inputs.typescript-scan-heap-memory }} ./../../scripts/analysis/analyze.sh ${{ env.domainAnalysisArgument }}${{ inputs.analysis-arguments }}
     
       - name: Set artifact name for uploaded analysis results
         id: set-analysis-results-artifact-name
diff --git a/COMMANDS.md b/COMMANDS.md
@@ -86,6 +86,8 @@ The [analyze.sh](./scripts/analysis/analyze.sh) command comes with these command
 
 - `--explore` activates the "explore" mode where no reports are generated. Furthermore, Neo4j won't be stopped at the end of the script and will therefore continue running.  This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually.
 
+- `--domain anomaly-detection` selects a single analysis domain (a subdirectory of [domains/](./domains/)) to run reports for, following a vertical-slice approach. When set, only that domain's report scripts run; core reports from `scripts/reports/` and other domains are skipped. The domain option composes with `--report` to further narrow down which reports are generated, e.g. `--domain anomaly-detection --report Csv`. When not specified, all domains and reports run unchanged. The selected domain name is passed to report compilation scripts via the environment variable `ANALYSIS_DOMAIN`. Available domains can be found in the [domains/](./domains/) directory.
+
 ### Notes
 
 - Be sure to use Java 21 for Neo4j v2025, Java 17 for v5 and Java 11 for v4. Details see [Neo4j System Requirements / Java](https://neo4j.com/docs/operations-manual/current/installation/requirements/#deployment-requirements-java).
@@ -144,6 +146,22 @@ without report generation use this command:
 ./../../scripts/analysis/analyze.sh --explore
 ```
 
+#### Only run the reports of one specific domain
+
+To only run the reports of a single analysis domain (vertical slice, no additional Python or Node.js dependencies for core reports):
+
+```shell
+./../../scripts/analysis/analyze.sh --domain anomaly-detection
+```
+
+#### Only run the CSV reports of one specific domain
+
+To further narrow down to only one report type within a specific domain:
+
+```shell
+./../../scripts/analysis/analyze.sh --domain anomaly-detection --report Csv
+```
+
 ## Generate Markdown References
 
 ### Generate Cypher Reference
diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md
@@ -118,6 +118,18 @@ Use these optional command line options as needed:
   ./../../scripts/analysis/analyze.sh --explore
   ```
 
+- Only run the reports of one specific domain (vertical slice):
+
+  ```shell
+  ./../../scripts/analysis/analyze.sh --domain anomaly-detection
+  ```
+
+- Only run the CSV reports of one specific domain:
+
+  ```shell
+  ./../../scripts/analysis/analyze.sh --domain anomaly-detection --report Csv
+  ```
+
 👉 Open your browser and login to your local Neo4j Web UI (`http://localhost:7474/browser`) with "neo4j" as user and the initial password you've chosen.
 
 ## GitHub Actions
diff --git a/scripts/analysis/analyze.sh b/scripts/analysis/analyze.sh
@@ -24,6 +24,12 @@
 #       It activates "explore" mode where no reports are executed and Neo4j keeps running (skip stop step).
 #       This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually.
 
+# Note: The argument "--domain" is optional. The default value is "" (empty = all domains run unchanged).
+#       It selects a single analysis domain (a subdirectory of "domains/") to run reports for, following a vertical-slice approach.
+#       When set, only that domain's report scripts run; core reports from "scripts/reports/" and other domains are skipped.
+#       The domain option can be combined with "--report" e.g. "--domain anomaly-detection --report Csv".
+#       Only a single domain can be selected. The domain name must match a subdirectory of the "domains" directory.
+
 # Note: The script and its sub scripts are designed to be as efficient as possible 
 #       when it comes to subsequent executions.
 #       Existing downloads, installations, scans and processes will be detected.
@@ -44,13 +50,14 @@ LOG_GROUP_END=${LOG_GROUP_END:-"::endgroup::"} # Prefix to end a log group. Defa
 
 # Function to display script usage
 usage() {
-  echo "Usage: $0 [--report <All (default), Csv, Jupyter, Python, Visualization...>] [--profile <Default, Neo4jv5, Neo4jv4,...>] [--explore]"
+  echo "Usage: $0 [--report <All (default), Csv, Jupyter, Python, Visualization...>] [--profile <Default, Neo4jv5, Neo4jv4,...>] [--domain <domain-name>] [--explore]"
   exit 1
 }
 
 # Default values
 analysisReportCompilation="All"
 settingsProfile="Default"
+selectedAnalysisDomain=""
 exploreMode=false
 
 # Parse command line arguments
@@ -69,6 +76,10 @@ while [[ $# -gt 0 ]]; do
       exploreMode=true
       shift
       ;;
+    --domain)
+      selectedAnalysisDomain="$2"
+      shift
+      ;;
     *)
       echo "analyze: Error: Unknown option: ${key}"
       usage
@@ -89,6 +100,16 @@ if ! [[ ${settingsProfile} =~ ^[-[:alnum:]]+$ ]]; then
   exit 1
 fi
 
+# Assure that the selected analysis domain only consists of letters, numbers, and hyphens (if specified).
+if [ -n "${selectedAnalysisDomain}" ]; then
+  case "${selectedAnalysisDomain}" in
+    *[!A-Za-z0-9-]*)
+      echo "analyze: Error: Domain '${selectedAnalysisDomain}' can only contain letters, numbers, and hyphens."
+      exit 1
+      ;;
+  esac
+fi
+
 # Check if there is something to scan and analyze
 if [ ! -d "${ARTIFACTS_DIRECTORY}" ] && [ ! -d "${SOURCE_DIRECTORY}" ] ; then
     echo "analyze: Neither ${ARTIFACTS_DIRECTORY} nor the ${SOURCE_DIRECTORY} directory exist. Please download artifacts/sources first."
@@ -98,6 +119,7 @@ fi
 echo "${LOG_GROUP_START}Start Analysis"
 echo "analyze: analysisReportCompilation=${analysisReportCompilation}"
 echo "analyze: settingsProfile=${settingsProfile}"
+echo "analyze: selectedAnalysisDomain=${selectedAnalysisDomain}"
 echo "analyze: exploreMode=${exploreMode}"
 
 ## Get this "scripts/analysis" directory if not already set
@@ -111,6 +133,24 @@ echo "analyze: ANALYSIS_SCRIPT_DIR=${ANALYSIS_SCRIPT_DIR}"
 SCRIPTS_DIR=${SCRIPTS_DIR:-$(dirname -- "${ANALYSIS_SCRIPT_DIR}")} # Repository directory containing the shell scripts
 echo "analyze: SCRIPTS_DIR=${SCRIPTS_DIR}"
 
+# Resolve the analysis domains directory. Can be overridden by the environment variable DOMAINS_DIRECTORY.
+DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY:-"${SCRIPTS_DIR}/../domains"}
+echo "analyze: DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY}"
+
+# When a specific analysis domain is selected, validate that it matches an existing subdirectory of the domains directory.
+# ANALYSIS_DOMAIN is empty when no domain is selected, causing all domains to run unchanged.
+ANALYSIS_DOMAIN=""
+if [ -n "${selectedAnalysisDomain}" ]; then
+  if [ ! -d "${DOMAINS_DIRECTORY}/${selectedAnalysisDomain}" ]; then
+    availableAnalysisDomains=$(find "${DOMAINS_DIRECTORY}" -mindepth 1 -maxdepth 1 -type d -exec basename {} \; 2>/dev/null | sort | tr '\n' ' ')
+    echo "analyze: Error: Selected domain '${selectedAnalysisDomain}' does not match any subdirectory in ${DOMAINS_DIRECTORY}."
+    echo "analyze: Available domains: ${availableAnalysisDomains}"
+    exit 1
+  fi
+  ANALYSIS_DOMAIN="${selectedAnalysisDomain}"
+  echo "analyze: ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN}"
+fi
+
 # Assure that there is a report compilation script for the given report argument.
 REPORT_COMPILATION_SCRIPT="${SCRIPTS_DIR}/${REPORTS_SCRIPTS_DIRECTORY}/${REPORT_COMPILATIONS_SCRIPTS_DIRECTORY}/${analysisReportCompilation}Reports.sh"
 if [ ! -f "${REPORT_COMPILATION_SCRIPT}" ] ; then
diff --git a/scripts/reports/compilations/CsvReports.sh b/scripts/reports/compilations/CsvReports.sh
@@ -29,12 +29,21 @@ echo "${LOG_GROUP_START}$(date +'%Y-%m-%dT%H:%M:%S') Initialize CSV Reports";
 echo "${SCRIPT_NAME}: REPORT_COMPILATIONS_SCRIPT_DIR=${REPORT_COMPILATIONS_SCRIPT_DIR}"
 echo "${SCRIPT_NAME}: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}"
 echo "${SCRIPT_NAME}: DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY}"
+echo "${SCRIPT_NAME}: ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN}"
 echo "${LOG_GROUP_END}";
 
 # Run all CSV report scripts (filename ending with Csv.sh) in the REPORTS_SCRIPT_DIR and DOMAINS_DIRECTORY directories.
-for directory in "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}"; do
+# When a specific analysis domain is selected, only run reports for that domain's directory.
+# Otherwise, run reports from both the general reports directory and all domains.
+if [ -n "${ANALYSIS_DOMAIN}" ]; then
+    analysisReportScriptDirectories=( "${DOMAINS_DIRECTORY}/${ANALYSIS_DOMAIN}" )
+else
+    analysisReportScriptDirectories=( "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}" )
+fi
+
+for directory in "${analysisReportScriptDirectories[@]}"; do
     if [ ! -d "${directory}" ]; then
-        echo "${SCRIPT_NAME}: Error: Directory ${directory} does not exist. Please check your REPORTS_SCRIPT_DIR and DOMAIN_DIRECTORY settings."
+        echo "${SCRIPT_NAME}: Error: Directory ${directory} does not exist. Please check your REPORTS_SCRIPT_DIR and DOMAINS_DIRECTORY settings."
         exit 1
     fi
 
diff --git a/scripts/reports/compilations/JupyterReports.sh b/scripts/reports/compilations/JupyterReports.sh
@@ -36,6 +36,13 @@ echo "${SCRIPT_NAME}: SCRIPTS_DIR=${SCRIPTS_DIR}"
 echo "${SCRIPT_NAME}: JUPYTER_NOTEBOOK_DIRECTORY=${JUPYTER_NOTEBOOK_DIRECTORY}"
 echo "${LOG_GROUP_END}";
 
+# Jupyter Notebook reports are not domain-scoped. Skip them when a specific analysis domain is selected.
+if [ -n "${ANALYSIS_DOMAIN}" ]; then
+    echo "${SCRIPT_NAME}: Skipping Jupyter Notebook reports because a specific analysis domain is selected (ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN})."
+    echo "${SCRIPT_NAME}: Jupyter Notebook reports are not domain-scoped and cannot be run for a specific domain."
+    return 0 2>/dev/null || exit 0
+fi
+
 # Run all jupiter notebooks
 for jupyter_notebook_file in "${JUPYTER_NOTEBOOK_DIRECTORY}"/*.ipynb; do 
     jupyter_notebook_filename=$(basename -- "${jupyter_notebook_file}")
diff --git a/scripts/reports/compilations/MarkdownReports.sh b/scripts/reports/compilations/MarkdownReports.sh
diff --git a/scripts/reports/compilations/PythonReports.sh b/scripts/reports/compilations/PythonReports.sh
diff --git a/scripts/reports/compilations/VisualizationReports.sh b/scripts/reports/compilations/VisualizationReports.sh
diff --git a/scripts/testAnalyzeDomainOption.sh b/scripts/testAnalyzeDomainOption.sh