Skip to content

Commit 540a8bd

Browse files
committed
Add --domain option to analyze.sh for domain-specific analysis
1 parent 4de6207 commit 540a8bd

File tree

12 files changed

+420
-11
lines changed

12 files changed

+420
-11
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Plan: Add `--domain` option to analyze.sh
2+
3+
Add an optional `--domain <name>` CLI option to `analyze.sh` that selects a single domain (subdirectory of `domains/`) for vertical-slice analysis. When set, only that domain's report scripts run; core reports from `scripts/reports/` and other domains are skipped. Composes naturally with `--report` (horizontal slice). When omitted, behavior is unchanged.
4+
5+
---
6+
7+
**Steps**
8+
9+
### Phase 1: `analyze.sh` CLI parsing and validation
10+
11+
1. **Add `--domain` to argument parsing** in [analyze.sh](scripts/analysis/analyze.sh) — add default `analysisDomain=""`, add `--domain)` case in the `while` loop, update `usage()`
12+
2. **Validate the domain name** — POSIX `case` glob pattern `*[!A-Za-z0-9-]*` to reject invalid characters (only if non-empty), resolve `DOMAINS_DIR="${SCRIPTS_DIR}/../domains"`, check `domains/<name>/` subdirectory exists with clear error message, then set `ANALYSIS_DOMAIN` (plain variable, no `export`)
13+
3. **Log the domain** in the "Start Analysis" group alongside `analysisReportCompilation`, `settingsProfile`, `exploreMode`
14+
15+
### Phase 2: Report compilation scripts — respect `ANALYSIS_DOMAIN` (*all steps parallel*)
16+
17+
4. **Modify [CsvReports.sh](scripts/reports/compilations/CsvReports.sh)** — when `ANALYSIS_DOMAIN` is set, replace `for directory in "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}"` with just `"${DOMAINS_DIRECTORY}/${ANALYSIS_DOMAIN}"`
18+
5. **Modify [PythonReports.sh](scripts/reports/compilations/PythonReports.sh)** — same pattern (Python env activation still runs)
19+
6. **Modify [VisualizationReports.sh](scripts/reports/compilations/VisualizationReports.sh)** — same pattern
20+
7. **Modify [MarkdownReports.sh](scripts/reports/compilations/MarkdownReports.sh)** — same pattern
21+
8. **Modify [JupyterReports.sh](scripts/reports/compilations/JupyterReports.sh)** — add early return with log message when `ANALYSIS_DOMAIN` is set (domains don't include Jupyter notebooks in the compilation path)
22+
9. **No changes to `AllReports.sh`** (chains the above scripts, filtering cascades) or **`DatabaseCsvExportReports.sh`** (special case, invoked explicitly only)
23+
24+
### Phase 3: GitHub Actions workflow (*depends on Phase 1*)
25+
26+
10. **Add `domain` input** to [public-analyze-code-graph.yml](.github/workflows/public-analyze-code-graph.yml) — optional string, default `''`. In the "Analyze" step, prepend `--domain <value>` to `analysis-arguments` when non-empty
27+
28+
### Phase 4: Documentation (*depends on Phase 1*)
29+
30+
11. **Update [analyze.sh](scripts/analysis/analyze.sh) header comments** — add `# Note:` block for `--domain` matching existing style
31+
12. **Update [COMMANDS.md](COMMANDS.md)** — add `--domain` under "Command Line Options" and document the `ANALYSIS_DOMAIN` environment variable alongside other overrideable variables
32+
13. **Update [GETTING_STARTED.md](GETTING_STARTED.md)** — add example: `./../../scripts/analysis/analyze.sh --domain anomaly-detection`
33+
34+
### Phase 5: Test scripts (*depends on Phases 1–2*)
35+
36+
14. **Create [testAnalyzeDomainOption.sh](scripts/testAnalyzeDomainOption.sh)** — follow existing conventions (`testCloneGitRepository.sh` pattern: `tearDown`, `successful`, `fail`, `info` helpers; temp directory with fake `domains/` structure; auto-discovered by `runTests.sh` via `find … -name 'test*.sh'`). Test cases:
37+
- Reject `--domain` with invalid characters (e.g. `../../etc`) → fails at regex
38+
- Reject `--domain` with nonexistent domain name → fails with error listing available domains
39+
- Accept `--domain` with valid name matching a temp subdirectory → passes validation (script then fails at "no artifacts" check, which confirms domain validation succeeded)
40+
- No `--domain` given → passes validation unchanged (same late failure)
41+
42+
---
43+
44+
**Relevant files**
45+
- `scripts/analysis/analyze.sh` — add `--domain` parsing, validation (match pattern of `settingsProfile`), set `ANALYSIS_DOMAIN` (no `export`)
46+
- `scripts/reports/compilations/CsvReports.sh` — conditionally filter `for directory in ...` loop
47+
- `scripts/reports/compilations/PythonReports.sh` — same conditional filtering
48+
- `scripts/reports/compilations/VisualizationReports.sh` — same conditional filtering
49+
- `scripts/reports/compilations/MarkdownReports.sh` — same conditional filtering
50+
- `scripts/reports/compilations/JupyterReports.sh` — early exit when `ANALYSIS_DOMAIN` is set
51+
- `.github/workflows/public-analyze-code-graph.yml` — add `domain` input, pass through
52+
- `COMMANDS.md` — document `--domain` option and `ANALYSIS_DOMAIN` environment variable
53+
- `GETTING_STARTED.md` — add usage examples
54+
- `scripts/testAnalyzeDomainOption.sh` — new test script for `--domain` validation (auto-discovered by `runTests.sh`)
55+
56+
**Verification**
57+
1. Run `analyze.sh --domain nonexistent` → clear error listing available domains
58+
2. Run `--domain anomaly-detection --report Csv` → only `anomalyDetectionCsv.sh` runs (no core CSV, no `externalDependenciesCsv.sh`)
59+
3. Run `--domain anomaly-detection` (default `--report All`) → only anomaly-detection scripts for Csv/Python/Visualization/Markdown; Jupyter skipped
60+
4. Run without `--domain` → all reports + all domains execute unchanged (backward compat)
61+
5. Run `--domain "../../etc"` → regex rejects it
62+
6. Run example script with `--domain anomaly-detection` → argument passes through via `"${@}"`
63+
64+
**Decisions**
65+
- `--domain` and `--report` compose: report selects type (horizontal), domain selects scope (vertical)
66+
- When `--domain` is set, core reports from `scripts/reports/` are **skipped** — only the domain's scripts run
67+
- JupyterReports.sh skipped when a domain is selected (no domain-scoped notebooks)
68+
- Only a single domain selectable (not comma-separated)
69+
- Propagated via `ANALYSIS_DOMAIN` shell variable (no `export`) from `analyze.sh` to compilation scripts — an env var (not script arguments) because compilation scripts are `source`d (not subprocesses), positional params would conflict in nested sourcing, and it follows the established convention (`DOMAINS_DIRECTORY`, `REPORTS_SCRIPT_DIR`, etc.)
70+
- **Not exported**`export` would leak the variable into all child processes (Python, Java/jQAssistant, Neo4j, npm/node) where it could collide with unrelated programs outside this project's control. Since all compilation scripts are `source`d (same shell), `export` is unnecessary
71+
- **POSIX-compliant where practical** — prefer `case` glob patterns over `[[ =~ ]]` for validation (e.g. `case "${var}" in *[!A-Za-z0-9-]*) …`), `[ ]` over `[[ ]]` for simple tests, standard parameter expansion, and portable constructs. No new external dependencies. Must run on macOS, Linux, and Windows (Git Bash, WSL). Exception: `${BASH_SOURCE[0]}` (already used throughout the codebase). Follow existing script conventions over strict POSIX when they conflict
72+
- **Readability over brevity** — no abbreviations in variable names, function names, or messages, even if names feel long (e.g. selectedAnalysisDomain over domain, analysisDomainsDirectory over domainsDir). Follow the existing codebase style (analysisReportCompilation, settingsProfile, REPORT_COMPILATIONS_SCRIPT_DIR, etc.)

.github/workflows/internal-typescript-upload-code-example.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,4 +121,5 @@ jobs:
121121
analysis-name: ${{ needs.prepare-code-to-analyze.outputs.analysis-name }}
122122
sources-upload-name: ${{ needs.prepare-code-to-analyze.outputs.sources-upload-name }}
123123
jupyter-pdf: "false"
124-
analysis-arguments: "--explore" # Only setup the Graph, do not generate any reports
124+
domain: "external-dependencies" # For testing purposes: only run the external-dependencies domain (vertical slice)
125+
analysis-arguments: "--report Csv" # For testing purposes: only generate CSV reports

.github/workflows/public-analyze-code-graph.yml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,18 @@ on:
7373
required: false
7474
type: string
7575
default: '--profile Neo4j-latest-low-memory'
76+
domain:
77+
description: >
78+
The name of an analysis domain to run.
79+
Must match a subdirectory name in the 'domains/' directory
80+
(e.g. 'anomaly-detection', 'external-dependencies').
81+
When set, only that domain's report scripts run;
82+
core reports from 'scripts/reports/' and other domains are skipped.
83+
Can be combined with 'analysis-arguments' to further narrow the reports.
84+
Default: '' (all domains and reports run unchanged)
85+
required: false
86+
type: string
87+
default: ''
7688
typescript-scan-heap-memory:
7789
description: >
7890
The heap memory size in MB to use for the TypeScript code scans (default=4096).
@@ -252,7 +264,8 @@ jobs:
252264
working-directory: temp
253265
run: |
254266
ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/'
255-
267+
- name: Assemble DOMAIN_ARGUMENT
268+
run: echo "domainAnalysisArgument=${{ inputs.domain != '' && format('--domain {0} ', inputs.domain) || '' }}" >> $GITHUB_ENV
256269
- name: (Code Analysis) Analyze ${{ inputs.analysis-name }}
257270
working-directory: temp/${{ inputs.analysis-name }}
258271
# Shell type can be skipped if jupyter notebook analysis-results (and therefore conda) aren't needed
@@ -264,7 +277,7 @@ jobs:
264277
PREPARE_CONDA_ENVIRONMENT: "false" # Had already been done in step with id "prepare-conda-environment".
265278
USE_VIRTUAL_PYTHON_ENVIRONMENT_VENV: ${{ inputs.use-venv_virtual_python_environment }}
266279
run: |
267-
TYPESCRIPT_SCAN_HEAP_MEMORY=${{ inputs.typescript-scan-heap-memory }} ./../../scripts/analysis/analyze.sh ${{ inputs.analysis-arguments }}
280+
TYPESCRIPT_SCAN_HEAP_MEMORY=${{ inputs.typescript-scan-heap-memory }} ./../../scripts/analysis/analyze.sh ${{ env.domainAnalysisArgument }}${{ inputs.analysis-arguments }}
268281
269282
- name: Set artifact name for uploaded analysis results
270283
id: set-analysis-results-artifact-name

COMMANDS.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,8 @@ The [analyze.sh](./scripts/analysis/analyze.sh) command comes with these command
8686

8787
- `--explore` activates the "explore" mode where no reports are generated. Furthermore, Neo4j won't be stopped at the end of the script and will therefore continue running. This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually.
8888

89+
- `--domain anomaly-detection` selects a single analysis domain (a subdirectory of [domains/](./domains/)) to run reports for, following a vertical-slice approach. When set, only that domain's report scripts run; core reports from `scripts/reports/` and other domains are skipped. The domain option composes with `--report` to further narrow down which reports are generated, e.g. `--domain anomaly-detection --report Csv`. When not specified, all domains and reports run unchanged. The selected domain name is passed to report compilation scripts via the environment variable `ANALYSIS_DOMAIN`. Available domains can be found in the [domains/](./domains/) directory.
90+
8991
### Notes
9092

9193
- Be sure to use Java 21 for Neo4j v2025, Java 17 for v5 and Java 11 for v4. Details see [Neo4j System Requirements / Java](https://neo4j.com/docs/operations-manual/current/installation/requirements/#deployment-requirements-java).
@@ -144,6 +146,22 @@ without report generation use this command:
144146
./../../scripts/analysis/analyze.sh --explore
145147
```
146148

149+
#### Only run the reports of one specific domain
150+
151+
To only run the reports of a single analysis domain (vertical slice, no additional Python or Node.js dependencies for core reports):
152+
153+
```shell
154+
./../../scripts/analysis/analyze.sh --domain anomaly-detection
155+
```
156+
157+
#### Only run the CSV reports of one specific domain
158+
159+
To further narrow down to only one report type within a specific domain:
160+
161+
```shell
162+
./../../scripts/analysis/analyze.sh --domain anomaly-detection --report Csv
163+
```
164+
147165
## Generate Markdown References
148166

149167
### Generate Cypher Reference

GETTING_STARTED.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,18 @@ Use these optional command line options as needed:
118118
./../../scripts/analysis/analyze.sh --explore
119119
```
120120
121+
- Only run the reports of one specific domain (vertical slice):
122+
123+
```shell
124+
./../../scripts/analysis/analyze.sh --domain anomaly-detection
125+
```
126+
127+
- Only run the CSV reports of one specific domain:
128+
129+
```shell
130+
./../../scripts/analysis/analyze.sh --domain anomaly-detection --report Csv
131+
```
132+
121133
👉 Open your browser and login to your local Neo4j Web UI (`http://localhost:7474/browser`) with "neo4j" as user and the initial password you've chosen.
122134

123135
## GitHub Actions

scripts/analysis/analyze.sh

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,12 @@
2424
# It activates "explore" mode where no reports are executed and Neo4j keeps running (skip stop step).
2525
# This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually.
2626

27+
# Note: The argument "--domain" is optional. The default value is "" (empty = all domains run unchanged).
28+
# It selects a single analysis domain (a subdirectory of "domains/") to run reports for, following a vertical-slice approach.
29+
# When set, only that domain's report scripts run; core reports from "scripts/reports/" and other domains are skipped.
30+
# The domain option can be combined with "--report" e.g. "--domain anomaly-detection --report Csv".
31+
# Only a single domain can be selected. The domain name must match a subdirectory of the "domains" directory.
32+
2733
# Note: The script and its sub scripts are designed to be as efficient as possible
2834
# when it comes to subsequent executions.
2935
# Existing downloads, installations, scans and processes will be detected.
@@ -44,13 +50,14 @@ LOG_GROUP_END=${LOG_GROUP_END:-"::endgroup::"} # Prefix to end a log group. Defa
4450

4551
# Function to display script usage
4652
usage() {
47-
echo "Usage: $0 [--report <All (default), Csv, Jupyter, Python, Visualization...>] [--profile <Default, Neo4jv5, Neo4jv4,...>] [--explore]"
53+
echo "Usage: $0 [--report <All (default), Csv, Jupyter, Python, Visualization...>] [--profile <Default, Neo4jv5, Neo4jv4,...>] [--domain <domain-name>] [--explore]"
4854
exit 1
4955
}
5056

5157
# Default values
5258
analysisReportCompilation="All"
5359
settingsProfile="Default"
60+
selectedAnalysisDomain=""
5461
exploreMode=false
5562

5663
# Parse command line arguments
@@ -69,6 +76,10 @@ while [[ $# -gt 0 ]]; do
6976
exploreMode=true
7077
shift
7178
;;
79+
--domain)
80+
selectedAnalysisDomain="$2"
81+
shift
82+
;;
7283
*)
7384
echo "analyze: Error: Unknown option: ${key}"
7485
usage
@@ -89,6 +100,16 @@ if ! [[ ${settingsProfile} =~ ^[-[:alnum:]]+$ ]]; then
89100
exit 1
90101
fi
91102

103+
# Assure that the selected analysis domain only consists of letters, numbers, and hyphens (if specified).
104+
if [ -n "${selectedAnalysisDomain}" ]; then
105+
case "${selectedAnalysisDomain}" in
106+
*[!A-Za-z0-9-]*)
107+
echo "analyze: Error: Domain '${selectedAnalysisDomain}' can only contain letters, numbers, and hyphens."
108+
exit 1
109+
;;
110+
esac
111+
fi
112+
92113
# Check if there is something to scan and analyze
93114
if [ ! -d "${ARTIFACTS_DIRECTORY}" ] && [ ! -d "${SOURCE_DIRECTORY}" ] ; then
94115
echo "analyze: Neither ${ARTIFACTS_DIRECTORY} nor the ${SOURCE_DIRECTORY} directory exist. Please download artifacts/sources first."
@@ -98,6 +119,7 @@ fi
98119
echo "${LOG_GROUP_START}Start Analysis"
99120
echo "analyze: analysisReportCompilation=${analysisReportCompilation}"
100121
echo "analyze: settingsProfile=${settingsProfile}"
122+
echo "analyze: selectedAnalysisDomain=${selectedAnalysisDomain}"
101123
echo "analyze: exploreMode=${exploreMode}"
102124

103125
## Get this "scripts/analysis" directory if not already set
@@ -111,6 +133,24 @@ echo "analyze: ANALYSIS_SCRIPT_DIR=${ANALYSIS_SCRIPT_DIR}"
111133
SCRIPTS_DIR=${SCRIPTS_DIR:-$(dirname -- "${ANALYSIS_SCRIPT_DIR}")} # Repository directory containing the shell scripts
112134
echo "analyze: SCRIPTS_DIR=${SCRIPTS_DIR}"
113135

136+
# Resolve the analysis domains directory. Can be overridden by the environment variable DOMAINS_DIRECTORY.
137+
DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY:-"${SCRIPTS_DIR}/../domains"}
138+
echo "analyze: DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY}"
139+
140+
# When a specific analysis domain is selected, validate that it matches an existing subdirectory of the domains directory.
141+
# ANALYSIS_DOMAIN is empty when no domain is selected, causing all domains to run unchanged.
142+
ANALYSIS_DOMAIN=""
143+
if [ -n "${selectedAnalysisDomain}" ]; then
144+
if [ ! -d "${DOMAINS_DIRECTORY}/${selectedAnalysisDomain}" ]; then
145+
availableAnalysisDomains=$(find "${DOMAINS_DIRECTORY}" -mindepth 1 -maxdepth 1 -type d -exec basename {} \; 2>/dev/null | sort | tr '\n' ' ')
146+
echo "analyze: Error: Selected domain '${selectedAnalysisDomain}' does not match any subdirectory in ${DOMAINS_DIRECTORY}."
147+
echo "analyze: Available domains: ${availableAnalysisDomains}"
148+
exit 1
149+
fi
150+
ANALYSIS_DOMAIN="${selectedAnalysisDomain}"
151+
echo "analyze: ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN}"
152+
fi
153+
114154
# Assure that there is a report compilation script for the given report argument.
115155
REPORT_COMPILATION_SCRIPT="${SCRIPTS_DIR}/${REPORTS_SCRIPTS_DIRECTORY}/${REPORT_COMPILATIONS_SCRIPTS_DIRECTORY}/${analysisReportCompilation}Reports.sh"
116156
if [ ! -f "${REPORT_COMPILATION_SCRIPT}" ] ; then

scripts/reports/compilations/CsvReports.sh

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,21 @@ echo "${LOG_GROUP_START}$(date +'%Y-%m-%dT%H:%M:%S') Initialize CSV Reports";
2929
echo "${SCRIPT_NAME}: REPORT_COMPILATIONS_SCRIPT_DIR=${REPORT_COMPILATIONS_SCRIPT_DIR}"
3030
echo "${SCRIPT_NAME}: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}"
3131
echo "${SCRIPT_NAME}: DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY}"
32+
echo "${SCRIPT_NAME}: ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN}"
3233
echo "${LOG_GROUP_END}";
3334

3435
# Run all CSV report scripts (filename ending with Csv.sh) in the REPORTS_SCRIPT_DIR and DOMAINS_DIRECTORY directories.
35-
for directory in "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}"; do
36+
# When a specific analysis domain is selected, only run reports for that domain's directory.
37+
# Otherwise, run reports from both the general reports directory and all domains.
38+
if [ -n "${ANALYSIS_DOMAIN}" ]; then
39+
analysisReportScriptDirectories=( "${DOMAINS_DIRECTORY}/${ANALYSIS_DOMAIN}" )
40+
else
41+
analysisReportScriptDirectories=( "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}" )
42+
fi
43+
44+
for directory in "${analysisReportScriptDirectories[@]}"; do
3645
if [ ! -d "${directory}" ]; then
37-
echo "${SCRIPT_NAME}: Error: Directory ${directory} does not exist. Please check your REPORTS_SCRIPT_DIR and DOMAIN_DIRECTORY settings."
46+
echo "${SCRIPT_NAME}: Error: Directory ${directory} does not exist. Please check your REPORTS_SCRIPT_DIR and DOMAINS_DIRECTORY settings."
3847
exit 1
3948
fi
4049

scripts/reports/compilations/JupyterReports.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,13 @@ echo "${SCRIPT_NAME}: SCRIPTS_DIR=${SCRIPTS_DIR}"
3636
echo "${SCRIPT_NAME}: JUPYTER_NOTEBOOK_DIRECTORY=${JUPYTER_NOTEBOOK_DIRECTORY}"
3737
echo "${LOG_GROUP_END}";
3838

39+
# Jupyter Notebook reports are not domain-scoped. Skip them when a specific analysis domain is selected.
40+
if [ -n "${ANALYSIS_DOMAIN}" ]; then
41+
echo "${SCRIPT_NAME}: Skipping Jupyter Notebook reports because a specific analysis domain is selected (ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN})."
42+
echo "${SCRIPT_NAME}: Jupyter Notebook reports are not domain-scoped and cannot be run for a specific domain."
43+
return 0 2>/dev/null || exit 0
44+
fi
45+
3946
# Run all jupiter notebooks
4047
for jupyter_notebook_file in "${JUPYTER_NOTEBOOK_DIRECTORY}"/*.ipynb; do
4148
jupyter_notebook_filename=$(basename -- "${jupyter_notebook_file}")

0 commit comments

Comments
 (0)