Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/prompts/plan-addDomainOption.prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Plan: Add `--domain` option to analyze.sh

Add an optional `--domain <name>` CLI option to `analyze.sh` that selects a single domain (subdirectory of `domains/`) for vertical-slice analysis. When set, only that domain's report scripts run; core reports from `scripts/reports/` and other domains are skipped. Composes naturally with `--report` (horizontal slice). When omitted, behavior is unchanged.

---

**Steps**

### Phase 1: `analyze.sh` CLI parsing and validation

1. **Add `--domain` to argument parsing** in [analyze.sh](scripts/analysis/analyze.sh) — add default `analysisDomain=""`, add `--domain)` case in the `while` loop, update `usage()`
2. **Validate the domain name** — POSIX `case` glob pattern `*[!A-Za-z0-9-]*` to reject invalid characters (only if non-empty), resolve `DOMAINS_DIR="${SCRIPTS_DIR}/../domains"`, check `domains/<name>/` subdirectory exists with clear error message, then set `ANALYSIS_DOMAIN` (plain variable, no `export`)
3. **Log the domain** in the "Start Analysis" group alongside `analysisReportCompilation`, `settingsProfile`, `exploreMode`

### Phase 2: Report compilation scripts — respect `ANALYSIS_DOMAIN` (*all steps parallel*)

4. **Modify [CsvReports.sh](scripts/reports/compilations/CsvReports.sh)** — when `ANALYSIS_DOMAIN` is set, replace `for directory in "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}"` with just `"${DOMAINS_DIRECTORY}/${ANALYSIS_DOMAIN}"`
5. **Modify [PythonReports.sh](scripts/reports/compilations/PythonReports.sh)** — same pattern (Python env activation still runs)
6. **Modify [VisualizationReports.sh](scripts/reports/compilations/VisualizationReports.sh)** — same pattern
7. **Modify [MarkdownReports.sh](scripts/reports/compilations/MarkdownReports.sh)** — same pattern
8. **Modify [JupyterReports.sh](scripts/reports/compilations/JupyterReports.sh)** — add early return with log message when `ANALYSIS_DOMAIN` is set (domains don't include Jupyter notebooks in the compilation path)
9. **No changes to `AllReports.sh`** (chains the above scripts, filtering cascades) or **`DatabaseCsvExportReports.sh`** (special case, invoked explicitly only)

### Phase 3: GitHub Actions workflow (*depends on Phase 1*)

10. **Add `domain` input** to [public-analyze-code-graph.yml](.github/workflows/public-analyze-code-graph.yml) — optional string, default `''`. In the "Analyze" step, prepend `--domain <value>` to `analysis-arguments` when non-empty

### Phase 4: Documentation (*depends on Phase 1*)

11. **Update [analyze.sh](scripts/analysis/analyze.sh) header comments** — add `# Note:` block for `--domain` matching existing style
12. **Update [COMMANDS.md](COMMANDS.md)** — add `--domain` under "Command Line Options" and document the `ANALYSIS_DOMAIN` environment variable alongside other overrideable variables
13. **Update [GETTING_STARTED.md](GETTING_STARTED.md)** — add example: `./../../scripts/analysis/analyze.sh --domain anomaly-detection`

### Phase 5: Test scripts (*depends on Phases 1–2*)

14. **Create [testAnalyzeDomainOption.sh](scripts/testAnalyzeDomainOption.sh)** — follow existing conventions (`testCloneGitRepository.sh` pattern: `tearDown`, `successful`, `fail`, `info` helpers; temp directory with fake `domains/` structure; auto-discovered by `runTests.sh` via `find … -name 'test*.sh'`). Test cases:
- Reject `--domain` with invalid characters (e.g. `../../etc`) → fails at regex
- Reject `--domain` with nonexistent domain name → fails with error listing available domains
- Accept `--domain` with valid name matching a temp subdirectory → passes validation (script then fails at "no artifacts" check, which confirms domain validation succeeded)
- No `--domain` given → passes validation unchanged (same late failure)

---

**Relevant files**
- `scripts/analysis/analyze.sh` — add `--domain` parsing, validation (match pattern of `settingsProfile`), set `ANALYSIS_DOMAIN` (no `export`)
- `scripts/reports/compilations/CsvReports.sh` — conditionally filter `for directory in ...` loop
- `scripts/reports/compilations/PythonReports.sh` — same conditional filtering
- `scripts/reports/compilations/VisualizationReports.sh` — same conditional filtering
- `scripts/reports/compilations/MarkdownReports.sh` — same conditional filtering
- `scripts/reports/compilations/JupyterReports.sh` — early exit when `ANALYSIS_DOMAIN` is set
- `.github/workflows/public-analyze-code-graph.yml` — add `domain` input, pass through
- `COMMANDS.md` — document `--domain` option and `ANALYSIS_DOMAIN` environment variable
- `GETTING_STARTED.md` — add usage examples
- `scripts/testAnalyzeDomainOption.sh` — new test script for `--domain` validation (auto-discovered by `runTests.sh`)

**Verification**
1. Run `analyze.sh --domain nonexistent` → clear error listing available domains
2. Run `--domain anomaly-detection --report Csv` → only `anomalyDetectionCsv.sh` runs (no core CSV, no `externalDependenciesCsv.sh`)
3. Run `--domain anomaly-detection` (default `--report All`) → only anomaly-detection scripts for Csv/Python/Visualization/Markdown; Jupyter skipped
4. Run without `--domain` → all reports + all domains execute unchanged (backward compat)
5. Run `--domain "../../etc"` → regex rejects it
6. Run example script with `--domain anomaly-detection` → argument passes through via `"${@}"`

**Decisions**
- `--domain` and `--report` compose: report selects type (horizontal), domain selects scope (vertical)
- When `--domain` is set, core reports from `scripts/reports/` are **skipped** — only the domain's scripts run
- JupyterReports.sh skipped when a domain is selected (no domain-scoped notebooks)
- Only a single domain selectable (not comma-separated)
- Propagated via `ANALYSIS_DOMAIN` shell variable (no `export`) from `analyze.sh` to compilation scripts — an env var (not script arguments) because compilation scripts are `source`d (not subprocesses), positional params would conflict in nested sourcing, and it follows the established convention (`DOMAINS_DIRECTORY`, `REPORTS_SCRIPT_DIR`, etc.)
- **Not exported** — `export` would leak the variable into all child processes (Python, Java/jQAssistant, Neo4j, npm/node) where it could collide with unrelated programs outside this project's control. Since all compilation scripts are `source`d (same shell), `export` is unnecessary
- **POSIX-compliant where practical** — prefer `case` glob patterns over `[[ =~ ]]` for validation (e.g. `case "${var}" in *[!A-Za-z0-9-]*) …`), `[ ]` over `[[ ]]` for simple tests, standard parameter expansion, and portable constructs. No new external dependencies. Must run on macOS, Linux, and Windows (Git Bash, WSL). Exception: `${BASH_SOURCE[0]}` (already used throughout the codebase). Follow existing script conventions over strict POSIX when they conflict
- **Readability over brevity** — no abbreviations in variable names, function names, or messages, even if names feel long (e.g. selectedAnalysisDomain over domain, analysisDomainsDirectory over domainsDir). Follow the existing codebase style (analysisReportCompilation, settingsProfile, REPORT_COMPILATIONS_SCRIPT_DIR, etc.)
Original file line number Diff line number Diff line change
Expand Up @@ -121,4 +121,5 @@ jobs:
analysis-name: ${{ needs.prepare-code-to-analyze.outputs.analysis-name }}
sources-upload-name: ${{ needs.prepare-code-to-analyze.outputs.sources-upload-name }}
jupyter-pdf: "false"
analysis-arguments: "--explore" # Only setup the Graph, do not generate any reports
domain: "external-dependencies" # For testing purposes: only run the external-dependencies domain (vertical slice)
analysis-arguments: "--report Csv" # For testing purposes: only generate CSV reports
17 changes: 15 additions & 2 deletions .github/workflows/public-analyze-code-graph.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,18 @@ on:
required: false
type: string
default: '--profile Neo4j-latest-low-memory'
domain:
description: >
The name of an analysis domain to run.
Must match a subdirectory name in the 'domains/' directory
(e.g. 'anomaly-detection', 'external-dependencies').
When set, only that domain's report scripts run;
core reports from 'scripts/reports/' and other domains are skipped.
Can be combined with 'analysis-arguments' to further narrow the reports.
Default: '' (all domains and reports run unchanged)
required: false
type: string
default: ''
typescript-scan-heap-memory:
description: >
The heap memory size in MB to use for the TypeScript code scans (default=4096).
Expand Down Expand Up @@ -252,7 +264,8 @@ jobs:
working-directory: temp
run: |
ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/'

- name: Assemble DOMAIN_ARGUMENT
run: echo "domainAnalysisArgument=${{ inputs.domain != '' && format('--domain {0} ', inputs.domain) || '' }}" >> $GITHUB_ENV
- name: (Code Analysis) Analyze ${{ inputs.analysis-name }}
working-directory: temp/${{ inputs.analysis-name }}
# Shell type can be skipped if jupyter notebook analysis-results (and therefore conda) aren't needed
Expand All @@ -264,7 +277,7 @@ jobs:
PREPARE_CONDA_ENVIRONMENT: "false" # Had already been done in step with id "prepare-conda-environment".
USE_VIRTUAL_PYTHON_ENVIRONMENT_VENV: ${{ inputs.use-venv_virtual_python_environment }}
run: |
TYPESCRIPT_SCAN_HEAP_MEMORY=${{ inputs.typescript-scan-heap-memory }} ./../../scripts/analysis/analyze.sh ${{ inputs.analysis-arguments }}
TYPESCRIPT_SCAN_HEAP_MEMORY=${{ inputs.typescript-scan-heap-memory }} ./../../scripts/analysis/analyze.sh ${{ env.domainAnalysisArgument }}${{ inputs.analysis-arguments }}

- name: Set artifact name for uploaded analysis results
id: set-analysis-results-artifact-name
Expand Down
18 changes: 18 additions & 0 deletions COMMANDS.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ The [analyze.sh](./scripts/analysis/analyze.sh) command comes with these command

- `--explore` activates the "explore" mode where no reports are generated. Furthermore, Neo4j won't be stopped at the end of the script and will therefore continue running. This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually.

- `--domain anomaly-detection` selects a single analysis domain (a subdirectory of [domains/](./domains/)) to run reports for, following a vertical-slice approach. When set, only that domain's report scripts run; core reports from `scripts/reports/` and other domains are skipped. The domain option composes with `--report` to further narrow down which reports are generated, e.g. `--domain anomaly-detection --report Csv`. When not specified, all domains and reports run unchanged. The selected domain name is passed to report compilation scripts via the environment variable `ANALYSIS_DOMAIN`. Available domains can be found in the [domains/](./domains/) directory.

### Notes

- Be sure to use Java 21 for Neo4j v2025, Java 17 for v5 and Java 11 for v4. Details see [Neo4j System Requirements / Java](https://neo4j.com/docs/operations-manual/current/installation/requirements/#deployment-requirements-java).
Expand Down Expand Up @@ -144,6 +146,22 @@ without report generation use this command:
./../../scripts/analysis/analyze.sh --explore
```

#### Only run the reports of one specific domain

To only run the reports of a single analysis domain (vertical slice, no additional Python or Node.js dependencies for core reports):

```shell
./../../scripts/analysis/analyze.sh --domain anomaly-detection
```

#### Only run the CSV reports of one specific domain

To further narrow down to only one report type within a specific domain:

```shell
./../../scripts/analysis/analyze.sh --domain anomaly-detection --report Csv
```

## Generate Markdown References

### Generate Cypher Reference
Expand Down
12 changes: 12 additions & 0 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,18 @@ Use these optional command line options as needed:
./../../scripts/analysis/analyze.sh --explore
```
- Only run the reports of one specific domain (vertical slice):
```shell
./../../scripts/analysis/analyze.sh --domain anomaly-detection
```
- Only run the CSV reports of one specific domain:
```shell
./../../scripts/analysis/analyze.sh --domain anomaly-detection --report Csv
```
👉 Open your browser and login to your local Neo4j Web UI (`http://localhost:7474/browser`) with "neo4j" as user and the initial password you've chosen.

## GitHub Actions
Expand Down
62 changes: 61 additions & 1 deletion scripts/analysis/analyze.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@
# It activates "explore" mode where no reports are executed and Neo4j keeps running (skip stop step).
# This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually.

# Note: The argument "--domain" is optional. The default value is "" (empty = all domains run unchanged).
# It selects a single analysis domain (a subdirectory of "domains/") to run reports for, following a vertical-slice approach.
# When set, only that domain's report scripts run; core reports from "scripts/reports/" and other domains are skipped.
# The domain option can be combined with "--report" e.g. "--domain anomaly-detection --report Csv".
# Only a single domain can be selected. The domain name must match a subdirectory of the "domains" directory.

# Note: The script and its sub scripts are designed to be as efficient as possible
# when it comes to subsequent executions.
# Existing downloads, installations, scans and processes will be detected.
Expand All @@ -44,31 +50,56 @@ LOG_GROUP_END=${LOG_GROUP_END:-"::endgroup::"} # Prefix to end a log group. Defa

# Function to display script usage
usage() {
echo "Usage: $0 [--report <All (default), Csv, Jupyter, Python, Visualization...>] [--profile <Default, Neo4jv5, Neo4jv4,...>] [--explore]"
echo "Usage: $0 [--report <All (default), Csv, Jupyter, Python, Visualization...>] [--profile <Default, Neo4jv5, Neo4jv4,...>] [--domain <domain-name>] [--explore]"
exit 1
}

# Default values
analysisReportCompilation="All"
settingsProfile="Default"
selectedAnalysisDomain=""
exploreMode=false

# Function to check if a parameter value is missing (either empty or another option starting with --)
is_missing_value_parameter() {
case "${2:-}" in
''|--*) return 0 ;; # missing value
*) return 1 ;; # value is present
esac
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
--report)
if is_missing_value_parameter "$1" "$2"; then
echo "analyze: Error: --report requires a value."
usage
fi
analysisReportCompilation="$2"
shift
;;
--profile)
if is_missing_value_parameter "$1" "$2"; then
echo "analyze: Error: --profile requires a value."
usage
fi
settingsProfile="$2"
shift
;;
--explore)
exploreMode=true
shift
;;
--domain)
if is_missing_value_parameter "$1" "$2"; then
echo "analyze: Error: --domain requires a value."
usage
fi
selectedAnalysisDomain="$2"
shift
;;
Comment thread
JohT marked this conversation as resolved.
*)
echo "analyze: Error: Unknown option: ${key}"
usage
Expand All @@ -89,6 +120,16 @@ if ! [[ ${settingsProfile} =~ ^[-[:alnum:]]+$ ]]; then
exit 1
fi

# Assure that the selected analysis domain only consists of letters, numbers, and hyphens (if specified).
if [ -n "${selectedAnalysisDomain}" ]; then
case "${selectedAnalysisDomain}" in
*[!A-Za-z0-9-]*)
echo "analyze: Error: Domain '${selectedAnalysisDomain}' can only contain letters, numbers, and hyphens."
exit 1
;;
esac
fi

# Check if there is something to scan and analyze
if [ ! -d "${ARTIFACTS_DIRECTORY}" ] && [ ! -d "${SOURCE_DIRECTORY}" ] ; then
echo "analyze: Neither ${ARTIFACTS_DIRECTORY} nor the ${SOURCE_DIRECTORY} directory exist. Please download artifacts/sources first."
Expand All @@ -98,6 +139,7 @@ fi
echo "${LOG_GROUP_START}Start Analysis"
echo "analyze: analysisReportCompilation=${analysisReportCompilation}"
echo "analyze: settingsProfile=${settingsProfile}"
echo "analyze: selectedAnalysisDomain=${selectedAnalysisDomain}"
echo "analyze: exploreMode=${exploreMode}"

## Get this "scripts/analysis" directory if not already set
Expand All @@ -111,6 +153,24 @@ echo "analyze: ANALYSIS_SCRIPT_DIR=${ANALYSIS_SCRIPT_DIR}"
SCRIPTS_DIR=${SCRIPTS_DIR:-$(dirname -- "${ANALYSIS_SCRIPT_DIR}")} # Repository directory containing the shell scripts
echo "analyze: SCRIPTS_DIR=${SCRIPTS_DIR}"

# Resolve the analysis domains directory. Can be overridden by the environment variable DOMAINS_DIRECTORY.
DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY:-"${SCRIPTS_DIR}/../domains"}
echo "analyze: DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY}"

# When a specific analysis domain is selected, validate that it matches an existing subdirectory of the domains directory.
# ANALYSIS_DOMAIN is empty when no domain is selected, causing all domains to run unchanged.
ANALYSIS_DOMAIN=""
if [ -n "${selectedAnalysisDomain}" ]; then
if [ ! -d "${DOMAINS_DIRECTORY}/${selectedAnalysisDomain}" ]; then
availableAnalysisDomains=$(find "${DOMAINS_DIRECTORY}" -mindepth 1 -maxdepth 1 -type d -exec basename {} \; 2>/dev/null | sort | tr '\n' ' ')
echo "analyze: Error: Selected domain '${selectedAnalysisDomain}' does not match any subdirectory in ${DOMAINS_DIRECTORY}."
echo "analyze: Available domains: ${availableAnalysisDomains}"
exit 1
fi
ANALYSIS_DOMAIN="${selectedAnalysisDomain}"
echo "analyze: ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN}"
fi

# Assure that there is a report compilation script for the given report argument.
REPORT_COMPILATION_SCRIPT="${SCRIPTS_DIR}/${REPORTS_SCRIPTS_DIRECTORY}/${REPORT_COMPILATIONS_SCRIPTS_DIRECTORY}/${analysisReportCompilation}Reports.sh"
if [ ! -f "${REPORT_COMPILATION_SCRIPT}" ] ; then
Expand Down
13 changes: 11 additions & 2 deletions scripts/reports/compilations/CsvReports.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,21 @@ echo "${LOG_GROUP_START}$(date +'%Y-%m-%dT%H:%M:%S') Initialize CSV Reports";
echo "${SCRIPT_NAME}: REPORT_COMPILATIONS_SCRIPT_DIR=${REPORT_COMPILATIONS_SCRIPT_DIR}"
echo "${SCRIPT_NAME}: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}"
echo "${SCRIPT_NAME}: DOMAINS_DIRECTORY=${DOMAINS_DIRECTORY}"
echo "${SCRIPT_NAME}: ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN}"
echo "${LOG_GROUP_END}";

# Run all CSV report scripts (filename ending with Csv.sh) in the REPORTS_SCRIPT_DIR and DOMAINS_DIRECTORY directories.
for directory in "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}"; do
# When a specific analysis domain is selected, only run reports for that domain's directory.
# Otherwise, run reports from both the general reports directory and all domains.
if [ -n "${ANALYSIS_DOMAIN}" ]; then
analysisReportScriptDirectories=( "${DOMAINS_DIRECTORY}/${ANALYSIS_DOMAIN}" )
else
analysisReportScriptDirectories=( "${REPORTS_SCRIPT_DIR}" "${DOMAINS_DIRECTORY}" )
fi

for directory in "${analysisReportScriptDirectories[@]}"; do
if [ ! -d "${directory}" ]; then
echo "${SCRIPT_NAME}: Error: Directory ${directory} does not exist. Please check your REPORTS_SCRIPT_DIR and DOMAIN_DIRECTORY settings."
echo "${SCRIPT_NAME}: Error: Directory ${directory} does not exist. Please check your REPORTS_SCRIPT_DIR and DOMAINS_DIRECTORY settings."
exit 1
fi

Expand Down
7 changes: 7 additions & 0 deletions scripts/reports/compilations/JupyterReports.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ echo "${SCRIPT_NAME}: SCRIPTS_DIR=${SCRIPTS_DIR}"
echo "${SCRIPT_NAME}: JUPYTER_NOTEBOOK_DIRECTORY=${JUPYTER_NOTEBOOK_DIRECTORY}"
echo "${LOG_GROUP_END}";

# Jupyter Notebook reports are not domain-scoped. Skip them when a specific analysis domain is selected.
if [ -n "${ANALYSIS_DOMAIN}" ]; then
echo "${SCRIPT_NAME}: Skipping Jupyter Notebook reports because a specific analysis domain is selected (ANALYSIS_DOMAIN=${ANALYSIS_DOMAIN})."
echo "${SCRIPT_NAME}: Jupyter Notebook reports are not domain-scoped and cannot be run for a specific domain."
return 0 2>/dev/null || exit 0
fi

# Run all jupiter notebooks
for jupyter_notebook_file in "${JUPYTER_NOTEBOOK_DIRECTORY}"/*.ipynb; do
jupyter_notebook_filename=$(basename -- "${jupyter_notebook_file}")
Expand Down
Loading
Loading