diff --git a/.github/prompts/plan-external_dependencies_domain.prompt.md b/.github/prompts/plan-external_dependencies_domain.prompt.md new file mode 100644 index 000000000..84599c780 --- /dev/null +++ b/.github/prompts/plan-external_dependencies_domain.prompt.md @@ -0,0 +1,160 @@ +## Plan: Create External Dependencies Domain + +Create a new self-contained `domains/external-dependencies/` domain following the `anomaly-detection` reference pattern. Copy all 35 Cypher queries, the existing CSV shell script, convert both Jupyter notebooks (Java + TypeScript) to a Python SVG chart generator, move the notebooks to an `explore/` folder with validation disabled, and assemble a Markdown summary report optimized for human and AI agent consumption. No moves or deletions of originals. No graph visualizations yet. + +### Decisions + +- **ExternalDependenciesCsv.sh**: Keep both (original in [scripts/reports/ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) unchanged, copy in domain) +- **Cypher files**: Copy all 35 into domain for self-containment +- **Charts**: ALL charts from both notebooks converted to Python SVG output (~38 total) +- **Package.json queries**: Included in domain +- **Reset folder**: Not included (external type labels are a central enrichment prerequisite) +- **Markdown summary**: Rich, AI-agent-actionable descriptions with architectural guidance +- **Entry point naming**: camelCase prefix (`externalDependenciesCsv.sh`, etc.) + +### Prerequisites (Documented in README, Not Copied) + +The following are provided by the central pipeline and must run *before* this domain: + +1. **Neo4j running** with scanned artifacts loaded +2. **DEPENDS_ON relationships** between types (jQAssistant scan) +3. **Type labels** ([cypher/Types/](cypher/Types/)): base Java types, built-in types, resolved duplicates +4. **Weight properties** on DEPENDS_ON ([cypher/DependsOn_Relationship_Weights/](cypher/DependsOn_Relationship_Weights/)): `weight`, `weightInterfaces` +5. **TypeScript enrichment** ([cypher/Typescript_Enrichment/](cypher/Typescript_Enrichment/)): module properties, namespace, `isNodeModule`, `IS_IMPLEMENTED_IN` resolution, npm linking +6. **General enrichment** ([cypher/General_Enrichment/](cypher/General_Enrichment/)): file name/extension properties + +### Domain Directory Structure + +``` +domains/external-dependencies/ +├── README.md +├── externalDependenciesCsv.sh # Entry point: CSV reports (*Csv.sh) +├── externalDependenciesPython.sh # Entry point: Python charts (*Python.sh) +├── externalDependenciesMarkdown.sh # Entry point: Markdown summary (*Markdown.sh) +├── externalDependencyCharts.py # Chart generation: pie, bar, scatter → SVG +├── explore/ +│ ├── ExternalDependenciesJava.ipynb # Original notebook (ValidateAlwaysFalse) +│ └── ExternalDependenciesTypescript.ipynb # Original notebook (ValidateAlwaysFalse) +├── queries/ +│ └── (all 35 .cypher files from cypher/External_Dependencies/) +└── summary/ + ├── externalDependenciesSummary.sh # Markdown assembly logic + └── report.template.md # Main report template +``` + +--- + +### Steps + +#### Phase 1: Scaffolding & Cypher Queries + +**1.1** Create domain directory structure with all subdirectories. + +**1.2** Copy all 35 `.cypher` files from [cypher/External_Dependencies/](cypher/External_Dependencies/) into `domains/external-dependencies/queries/`. + +**1.3** Create `README.md` — domain overview, prerequisites, entry points, folder structure (matching [anomaly-detection README](domains/anomaly-detection/README.md) format). + +#### Phase 2: CSV Entry Point Script + +**2.1** Create `externalDependenciesCsv.sh`: +- Follow exact boilerplate pattern of [anomalyDetectionCsv.sh](domains/anomaly-detection/anomalyDetectionCsv.sh) (BASH_SOURCE/CDPATH, `set -o errexit -o pipefail`, script directory resolution) +- Source `../../scripts/executeQueryFunctions.sh` for `execute_cypher()` and `execute_cypher_queries_until_results()` +- Source `../../scripts/cleanupAfterReportGeneration.sh` for cleanup +- Report directory: `reports/external-dependencies` +- First check/create ExternalType labels, then execute all 24+ queries → CSV files +- Replicate query ordering from [ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) + +#### Phase 3: Python Chart Generation Script (*parallel with Phase 4*) + +**3.1** Create `externalDependencyCharts.py`: +- Follow `Parameters` class pattern from [anomalyDetectionFeaturePlots.py](domains/anomaly-detection/anomalyDetectionFeaturePlots.py): `--report_directory`, `--verbose`, `--language` parameters +- Neo4j connection using `neo4j` Python driver (same pattern as anomaly detection) +- Load and execute Cypher `.cypher` files from `queries/` directory + +**3.2** Data processing functions (extracted from notebooks): +- `group_to_others_below_threshold(data_frame, value_column, name_column, threshold)` → DataFrame +- `filter_values_below_threshold(data_frame, value_column, upper_limit)` → DataFrame +- `explode_pie_slice(data_frame, index_value, base_value, emphasize_value)` → array +- `plot_pie_chart(data_frame, title, file_path)` → saves SVG with percentage labels, legend, "others" explode +- `plot_stacked_bar_chart(pivot_data, title, xlabel, ylabel, file_path)` → saves SVG +- `plot_scatter_with_annotations(data_frame, x, y, size, color, title, file_path, annotations)` → saves SVG + +**3.3** Java chart generation (**22 charts**): +- 16 pie charts: top packages × {types, packages} × {overall, drill-down} × {full, second-level} + spread variants +- 2 stacked bar charts: external packages per artifact (full + second-level) +- 2 scatter plots: max/median internal package percentage vs external package count + +**3.4** TypeScript chart generation (**~16 charts**): +- 16 pie charts: modules × {elements, modules} × {overall, drill-down} + namespace variants + spread + +**3.5** `main()` function: parse arguments, connect Neo4j, generate charts per language, handle "no data" gracefully. + +#### Phase 4: Python Entry Point Script (*parallel with Phase 3*) + +**4.1** Create `externalDependenciesPython.sh`: +- Follow pattern of [anomalyDetectionPython.sh](domains/anomaly-detection/anomalyDetectionPython.sh) +- Set script directory, report directory +- Call `python externalDependencyCharts.py --language Java --report_directory ...` +- Call again with `--language Typescript` + +#### Phase 5: Markdown Summary + +**5.1** Create `summary/report.template.md`: +- YAML front matter (title, date, model version) +- Section 1: Executive Overview — total external packages/modules, key frameworks identified +- Section 2: Java External Dependencies — most used, spread analysis, per-artifact, aggregated (with `` for tables and SVG references) +- Section 3: TypeScript External Dependencies — modules, namespaces, per-module breakdown, package.json +- Section 4: Architectural Recommendations — Hexagonal Architecture, Anti-Corruption Layer guidance +- Section 5: Glossary & Column Definitions — all column descriptions from notebook markdown + +**5.2** Create `summary/externalDependenciesSummary.sh`: +- Follow [anomalyDetectionSummary.sh](domains/anomaly-detection/summary/anomalyDetectionSummary.sh) pattern +- Read CSV files → generate markdown table snippet includes +- Conditionally include SVG chart references if files exist +- Generate front matter (title, date, git tag) +- Assemble final `external_dependencies_report.md` + +**5.3** Create `externalDependenciesMarkdown.sh`: +- Thin delegator to `summary/externalDependenciesSummary.sh` (matching [anomalyDetectionMarkdown.sh](domains/anomaly-detection/anomalyDetectionMarkdown.sh) pattern) + +#### Phase 6: Exploration Notebooks + +**6.1** Copy [jupyter/ExternalDependenciesJava.ipynb](jupyter/ExternalDependenciesJava.ipynb) → `explore/ExternalDependenciesJava.ipynb`, add `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"` to metadata (matching [AnomalyDetectionExploration.ipynb](domains/anomaly-detection/explore/AnomalyDetectionExploration.ipynb)). + +**6.2** Copy [jupyter/ExternalDependenciesTypescript.ipynb](jupyter/ExternalDependenciesTypescript.ipynb) → `explore/ExternalDependenciesTypescript.ipynb`, same metadata update. + +--- + +### Relevant Files + +**To create** (in `domains/external-dependencies/`): +- `README.md`, 3 entry point `.sh`, `externalDependencyCharts.py`, `summary/externalDependenciesSummary.sh`, `summary/report.template.md` + +**To copy** (35 `.cypher` + 2 `.ipynb`): +- [cypher/External_Dependencies/](cypher/External_Dependencies/) → `queries/` +- [jupyter/ExternalDependenciesJava.ipynb](jupyter/ExternalDependenciesJava.ipynb), [jupyter/ExternalDependenciesTypescript.ipynb](jupyter/ExternalDependenciesTypescript.ipynb) → `explore/` + +**Reference (read-only)**: +- [scripts/executeQueryFunctions.sh](scripts/executeQueryFunctions.sh) — `execute_cypher()`, `extractQueryParameter()` +- [scripts/cleanupAfterReportGeneration.sh](scripts/cleanupAfterReportGeneration.sh) +- [scripts/reports/ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) — query ordering reference +- [domains/anomaly-detection/anomalyDetectionFeaturePlots.py](domains/anomaly-detection/anomalyDetectionFeaturePlots.py) — `Parameters`, `get_file_path()`, Neo4j query pattern +- [domains/anomaly-detection/anomalyDetectionCsv.sh](domains/anomaly-detection/anomalyDetectionCsv.sh) — shell boilerplate +- [domains/anomaly-detection/summary/anomalyDetectionSummary.sh](domains/anomaly-detection/summary/anomalyDetectionSummary.sh) — template assembly, `` pattern + +### Verification + +1. **Structure**: Domain contains 3 `.sh` entry points, 1 `.py`, ~35 `.cypher` in queries/, 2 `.ipynb` in explore/, summary/ with `.sh` and `.md` +2. **Shell lint**: `shellcheck domains/external-dependencies/*.sh domains/external-dependencies/summary/*.sh` +3. **Python lint**: `python -m py_compile domains/external-dependencies/externalDependencyCharts.py` +4. **Pipeline discovery**: `find domains/ -name "*Csv.sh"`, `*Python.sh`, `*Markdown.sh` all return the new domain's scripts +5. **Notebook metadata**: Both explore/ notebooks contain `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"` +6. **Cypher count**: `ls domains/external-dependencies/queries/*.cypher | wc -l` matches original count +7. **No external changes**: No new files outside `domains/external-dependencies/` +8. **README completeness**: Documents prerequisites, entry points, folder structure + +### Scope Boundaries + +**Included**: All 35 cypher queries, ~38 SVG chart conversions, CSV entry point, Markdown summary, exploration notebooks, prerequisites documentation, package.json queries + +**Excluded**: Graph visualizations (GraphViz), moving/deleting originals, reset queries, changes to central pipeline scripts, validation cypher query diff --git a/.github/prompts/plan-internal_dependencies_domain.prompt.md b/.github/prompts/plan-internal_dependencies_domain.prompt.md new file mode 100644 index 000000000..8f7348a3d --- /dev/null +++ b/.github/prompts/plan-internal_dependencies_domain.prompt.md @@ -0,0 +1,379 @@ +# Plan: Create Internal Dependencies Domain + +Create a new self-contained `domains/internal-dependencies/` domain following the `external-dependencies` and `anomaly-detection` reference patterns. This domain covers **internal dependencies**, **cyclic dependencies**, **path finding**, and **topological sort** — all analyses of how internal code units depend on each other across multiple abstraction levels. Copy all relevant Cypher queries, shell scripts, and Jupyter notebooks. Convert path finding notebook charts to a Python SVG chart generator. Move original notebooks to an `explore/` folder with validation disabled. Assemble a Markdown summary report. No moves or deletions of originals. + +### Decisions + +- **Domain name**: `internal-dependencies`. Path finding, topological sort, and cyclic dependencies are implementation details or analysis methods applied to internal dependencies across different abstraction levels. +- **Topological Sort**: Included in the domain. All 5 Cypher queries and the full `TopologicalSortCsv.sh` logic are copied. +- **Cyclic Dependencies**: Included. Only the 7 Cypher files actually used by existing scripts and notebooks are copied (excluding `Cyclic_Dependencies_Concatenated.cypher` and `Cyclic_Dependencies_as_Nodes.cypher` which are unreferenced). +- **Report output directory**: Everything under `reports/internal-dependencies/`. This is a **breaking change** vs. the old directories (`internal-dependencies-csv/`, `path-finding-csv/`, `topology-csv/`, `internal-dependencies-visualization/`, `path-finding-visualization/`). When the old scripts are eventually removed, a major version bump is required. +- **Report subdirectory structure**: Orient by abstraction level following the anomaly-detection pattern: `Java_Package/`, `Java_Artifact/`, `Java_Type/`, `Typescript_Module/`, `NPM_NonDevPackage/`, `NPM_DevPackage/`. General results (e.g. file distance, overall cyclic dependencies) go into the main `reports/internal-dependencies/` directory. Graph visualizations within an abstraction level get a `Graph_Visualizations/` subfolder when there are multiple files. +- **Dependencies_Projection**: Documented as a core prerequisite. Not copied. Follow-up planned to rethink core dependency placement. +- **projectionFunctions.sh**: Documented as a core prerequisite. Not copied. Sourced from `../../scripts/`. +- **Charts**: Only path finding notebooks produce charts (~27 bar + pie charts). Internal dependencies notebooks are tables only. The Python chart script converts path finding visualizations to SVG. +- **Entry point naming**: camelCase prefix matching domain name: `internalDependenciesCsv.sh`, `internalDependenciesPython.sh`, `internalDependenciesVisualization.sh`, `internalDependenciesMarkdown.sh`. +- **Extra Cypher files from notebooks**: `Artifacts_with_duplicate_packages.cypher` (from `Artifact_Dependencies/`) and `Annotated_code_elements.cypher` (from `Java/`) are referenced by `InternalDependenciesJava.ipynb`. These are copied into the domain `queries/` directory for the explore notebook but are NOT executed by the CSV entry point (they serve exploratory analysis). +- **Reset folder**: Not needed. Internal dependency analysis does not create persistent labels that need removal. +- **Markdown summary**: Rich, structured report with tables, chart references, architectural descriptions, and a glossary. Designed for both humans and AI agents. Every linked table and chart gets a short description (1–3 sentences). Algorithm explanations and domain concepts from the Jupyter notebooks are distilled into concise, improved prose in the report template — not copied verbatim but condensed for clarity and LLM-friendliness. +- **Deprecated files tracking**: A `COPIED_FILES.md` documents every file that was copied so a follow-up task can cleanly remove the originals. + +### Prerequisites (Documented in README and PREREQUISITES.md, Not Copied) + +The following are provided by the central pipeline and must run *before* this domain: + +1. **Neo4j running** with scanned artifacts loaded +2. **DEPENDS_ON relationships** between types (jQAssistant scan) +3. **Type labels** ([cypher/Types/](../../cypher/Types/)): `PrimitiveType`, `Void`, `JavaType`, `ResolvedDuplicateType` +4. **Weight properties** on DEPENDS_ON ([cypher/DependsOn_Relationship_Weights/](../../cypher/DependsOn_Relationship_Weights/)): `weight`, `weightInterfaces`, `weight25PercentInterfaces` +5. **Dependencies Projection** ([cypher/Dependencies_Projection/](../../cypher/Dependencies_Projection/)): provides `createDirectedDependencyProjection` and related graph projection management. Used by path finding and topological sort. +6. **Projection functions** ([scripts/projectionFunctions.sh](../../scripts/projectionFunctions.sh)): shell functions wrapping Dependencies_Projection Cypher queries +7. **TypeScript enrichment** ([cypher/Typescript_Enrichment/](../../cypher/Typescript_Enrichment/)): module properties, namespace, `isNodeModule`, `IS_IMPLEMENTED_IN` resolution, npm linking, `lowCouplingElement25PercentWeight` +8. **General enrichment** ([cypher/General_Enrichment/](../../cypher/General_Enrichment/)): `name` and `extension` properties on `File` nodes +9. **Metrics** ([cypher/Metrics/](../../cypher/Metrics/)): dependency degree calculations (incoming/outgoing) used indirectly + +### Domain Directory Structure + +``` +domains/internal-dependencies/ +├── README.md +├── PREREQUISITES.md # Detailed prerequisite documentation +├── COPIED_FILES.md # Tracking: original → copy mapping for deprecation follow-up +├── internalDependenciesCsv.sh # Entry point: CSV reports (*Csv.sh) +├── internalDependenciesPython.sh # Entry point: Python charts (*Python.sh) +├── internalDependenciesVisualization.sh # Entry point: Graph visualizations (*Visualization.sh) +├── internalDependenciesMarkdown.sh # Entry point: Markdown summary (*Markdown.sh) +├── pathFindingCharts.py # Chart generation: bar, pie → SVG +├── explore/ +│ ├── InternalDependenciesJava.ipynb # Original notebook (ValidateAlwaysFalse) +│ ├── InternalDependenciesTypescript.ipynb # Original notebook (ValidateAlwaysFalse) +│ ├── PathFindingJava.ipynb # Original notebook (ValidateAlwaysFalse) +│ └── PathFindingTypescript.ipynb # Original notebook (ValidateAlwaysFalse) +├── queries/ +│ ├── internal-dependencies/ # 14 files from cypher/Internal_Dependencies/ +│ │ ├── Candidates_for_Interface_Segregation.cypher +│ │ ├── Get_file_distance_as_shortest_contains_path_for_dependencies.cypher +│ │ ├── How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher +│ │ ├── How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher +│ │ ├── How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher +│ │ ├── Inter_scan_and_project_dependencies_of_Typescript_modules.cypher +│ │ ├── Java_Artifact_build_levels_for_graphviz.cypher +│ │ ├── List_all_Java_artifacts.cypher +│ │ ├── List_all_Typescript_modules.cypher +│ │ ├── List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher +│ │ ├── List_types_that_are_used_by_many_different_packages.cypher +│ │ ├── NPM_Package_build_levels_for_graphviz.cypher +│ │ ├── Set_file_distance_as_shortest_contains_path_for_dependencies.cypher +│ │ └── Typescript_Module_build_levels_for_graphviz.cypher +│ ├── cyclic-dependencies/ # 7 files from cypher/Cyclic_Dependencies/ +│ │ ├── Cyclic_Dependencies.cypher +│ │ ├── Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher +│ │ ├── Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher +│ │ ├── Cyclic_Dependencies_Breakdown_Backward_Only.cypher +│ │ ├── Cyclic_Dependencies_Breakdown_for_Typescript.cypher +│ │ ├── Cyclic_Dependencies_Breakdown.cypher +│ │ └── Cyclic_Dependencies_for_Typescript.cypher +│ ├── path-finding/ # 15 files from cypher/Path_Finding/ +│ │ ├── Path_Finding_1_Create_Projection.cypher +│ │ ├── Path_Finding_2_Estimate_Memory.cypher +│ │ ├── Path_Finding_3_Depth_First_Search_Path.cypher +│ │ ├── Path_Finding_4_Breadth_First_Search_Path.cypher +│ │ ├── Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher +│ │ ├── Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher +│ │ ├── Path_Finding_5_All_pairs_shortest_path_examples.cypher +│ │ ├── Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher +│ │ ├── Path_Finding_6_Longest_paths_distribution_overall.cypher +│ │ ├── Path_Finding_6_Longest_paths_distribution_per_project.cypher +│ │ ├── Path_Finding_6_Longest_paths_examples.cypher +│ │ ├── Path_Finding_6_Longest_paths_for_graphviz.cypher +│ │ ├── Set_Parameters.cypher +│ │ ├── Set_Parameters_NonDevNpmPackage.cypher +│ │ └── Set_Parameters_Typescript_Module.cypher +│ ├── topological-sort/ # 5 files from cypher/Topological_Sort/ +│ │ ├── Set_Parameters.cypher +│ │ ├── Topological_Sort_Exists.cypher +│ │ ├── Topological_Sort_List.cypher +│ │ ├── Topological_Sort_Query.cypher +│ │ └── Topological_Sort_Write.cypher +│ └── exploration/ # 2 files only referenced by explore/ notebooks +│ ├── Artifacts_with_duplicate_packages.cypher +│ └── Annotated_code_elements.cypher +├── graphs/ +│ └── internalDependenciesGraphs.sh # Graph visualization orchestration +└── summary/ + ├── internalDependenciesSummary.sh # Markdown assembly logic + └── report.template.md # Main report template +``` + +--- + +### Steps + +#### Phase 1: Scaffolding & Cypher Queries + +**1.1** Create the domain directory structure with all subdirectories: +`domains/internal-dependencies/{explore,queries/{internal-dependencies,cyclic-dependencies,path-finding,topological-sort,exploration},graphs,summary}` + +**1.2** Copy the 14 `.cypher` files from [cypher/Internal_Dependencies/](../../cypher/Internal_Dependencies/) into `queries/internal-dependencies/`. + +**1.3** Copy the 7 used `.cypher` files from [cypher/Cyclic_Dependencies/](../../cypher/Cyclic_Dependencies/) into `queries/cyclic-dependencies/`. +Excluded: `Cyclic_Dependencies_Concatenated.cypher`, `Cyclic_Dependencies_as_Nodes.cypher` (unreferenced by any script or notebook). + +**1.4** Copy all 15 `.cypher` files from [cypher/Path_Finding/](../../cypher/Path_Finding/) into `queries/path-finding/`. + +**1.5** Copy all 5 `.cypher` files from [cypher/Topological_Sort/](../../cypher/Topological_Sort/) into `queries/topological-sort/`. + +**1.6** Copy 2 exploration-only `.cypher` files into `queries/exploration/`: +- [cypher/Artifact_Dependencies/Artifacts_with_duplicate_packages.cypher](../../cypher/Artifact_Dependencies/Artifacts_with_duplicate_packages.cypher) +- [cypher/Java/Annotated_code_elements.cypher](../../cypher/Java/Annotated_code_elements.cypher) + +**1.7** Create `PREREQUISITES.md` documenting all external dependencies (see Prerequisites section above). + +**1.8** Create `COPIED_FILES.md` tracking every original → copy mapping for the deprecation follow-up. + +**1.9** Create `README.md` — domain overview, entry points, folder structure, prerequisites reference, what the domain produces (matching [external-dependencies README](../../domains/external-dependencies/README.md) format). + +#### Phase 2: CSV Entry Point Script + +**2.1** Create `internalDependenciesCsv.sh`: +- Follow exact boilerplate pattern of [anomalyDetectionCsv.sh](../../domains/anomaly-detection/anomalyDetectionCsv.sh) for the domain script directory resolution (`BASH_SOURCE`/`CDPATH`, `set -o errexit -o pipefail`) +- Source `../../scripts/executeQueryFunctions.sh` for `execute_cypher()` and `execute_cypher_queries_until_results()` +- Source `../../scripts/projectionFunctions.sh` for `createDirectedDependencyProjection` and related projection functions +- Main report directory: `reports/internal-dependencies` +- Create abstraction-level subdirectories: `Java_Artifact/`, `Java_Package/`, `Java_Type/`, `Typescript_Module/`, `NPM_NonDevPackage/`, `NPM_DevPackage/` +- Combine logic from [InternalDependenciesCsv.sh](../../scripts/reports/InternalDependenciesCsv.sh), [PathFindingCsv.sh](../../scripts/reports/PathFindingCsv.sh), and [TopologicalSortCsv.sh](../../scripts/reports/TopologicalSortCsv.sh) +- General results (file distance, overall cyclic deps) go into main report directory +- Per-abstraction-level results go into their respective subdirectories + +**Internal dependencies queries** — replicate ordering from [InternalDependenciesCsv.sh](../../scripts/reports/InternalDependenciesCsv.sh): +1. File distance (Get + Set) → main directory +2. Java cyclic dependencies (3 queries) → `Java_Package/` +3. Artifact cyclic dependencies (unwinded) → `Java_Artifact/` +4. Interface segregation candidates → `Java_Package/` +5. List all Java artifacts → `Java_Artifact/` +6. Widely used types → `Java_Package/` +7. Package usage by dependent artifacts → `Java_Artifact/` +8. Class usage across artifacts → `Java_Artifact/` +9. TypeScript cyclic dependencies (3 queries) → `Typescript_Module/` +10. List all TypeScript modules → `Typescript_Module/` +11. Widely used TypeScript elements → `Typescript_Module/` +12. Module element usage → `Typescript_Module/` + +**Path finding queries** — replicate logic from [PathFindingCsv.sh](../../scripts/reports/PathFindingCsv.sh): +1. Java Artifact: all pairs shortest path + longest path → `Java_Artifact/` +2. Java Package: all pairs shortest path + longest path → `Java_Package/` +3. TypeScript Module: all pairs shortest path + longest path → `Typescript_Module/` +4. NPM Non-Dev Package: all pairs shortest path + longest path → `NPM_NonDevPackage/` +5. NPM Dev Package: all pairs shortest path + longest path → `NPM_DevPackage/` +(Java Type and Method path finding remain deactivated as in original) + +**Topological sort queries** — replicate logic from [TopologicalSortCsv.sh](../../scripts/reports/TopologicalSortCsv.sh): +1. Java Artifact → `Java_Artifact/` +2. Java Package → `Java_Package/` +3. Java Type → `Java_Type/` +4. TypeScript Module → `Typescript_Module/` +5. NPM Non-Dev Package → `NPM_NonDevPackage/` +6. NPM Dev Package → `NPM_DevPackage/` + +**2.2** Clean-up: source `cleanupAfterReportGeneration.sh` for each subdirectory and the main directory. + +#### Phase 3: Python Chart Generation Script (*parallel with Phase 4*) + +**3.1** Create `pathFindingCharts.py`: +- Follow `Parameters` class pattern from [externalDependencyCharts.py](../../domains/external-dependencies/externalDependencyCharts.py): `--report_directory`, `--verbose` parameters +- Neo4j connection using `neo4j` Python driver +- Load and execute Cypher `.cypher` files from `queries/path-finding/` directory + +**3.2** Data processing functions (extracted from PathFinding notebooks): +- `pivot_distribution_by_project(data_frame, distance_column, count_column, project_column)` → DataFrame +- `normalize_distribution_by_project(pivoted_data)` → DataFrame +- `format_pie_label(percentage, all_values)` → str (percentage + count format) + +**3.3** Chart generation functions: +- `plot_distribution_bar(data_frame, distance_column, count_column, title, file_path)` → saves SVG +- `plot_distribution_pie(data_frame, distance_column, count_column, title, file_path)` → saves SVG +- `plot_per_project_stacked_bar(pivoted_data, title, file_path, use_log_scale)` → saves SVG +- `plot_per_project_normalized_bar(normalized_data, title, file_path)` → saves SVG +- `plot_diameter_bar(data_frame, project_column, diameter_column, title, file_path)` → saves SVG + +**3.4** Per-abstraction-level chart generation: +- **Java Package**: all pairs shortest path (bar, pie, stacked log, stacked normalized, diameter) + longest path (bar, pie, stacked log, stacked normalized, max per artifact) = ~10 charts +- **Java Artifact**: all pairs shortest path (bar, pie) + longest path (bar, pie) = ~4 charts +- **TypeScript Module**: same as Java Package = ~10 charts +- **NPM packages**: if data exists, same pattern + +**3.5** `main()` function: parse arguments, connect Neo4j, generate charts per abstraction level, handle "no data" gracefully and skip. + +#### Phase 4: Entry Point Shell Scripts (*parallel with Phase 3*) + +**4.1** Create `internalDependenciesPython.sh`: +- Follow pattern of [externalDependenciesPython.sh](../../domains/external-dependencies/externalDependenciesPython.sh) +- Set script directory, report directory +- Call `python pathFindingCharts.py --report_directory "${FULL_REPORT_DIRECTORY}" ${verboseMode}` + +**4.2** Create `internalDependenciesVisualization.sh`: +- Follow entry-point delegation pattern of [anomalyDetectionVisualization.sh](../../domains/anomaly-detection/anomalyDetectionVisualization.sh) +- Delegate to `graphs/internalDependenciesGraphs.sh` + +**4.3** Create `internalDependenciesMarkdown.sh`: +- Follow entry-point delegation pattern of [anomalyDetectionMarkdown.sh](../../domains/anomaly-detection/anomalyDetectionMarkdown.sh) +- Delegate to `summary/internalDependenciesSummary.sh` + +#### Phase 5: Graph Visualizations + +**5.1** Create `graphs/internalDependenciesGraphs.sh`: +- Combine logic from [InternalDependenciesVisualization.sh](../../scripts/reports/InternalDependenciesVisualization.sh) and [PathFindingVisualization.sh](../../scripts/reports/PathFindingVisualization.sh) +- Source `../../scripts/executeQueryFunctions.sh`, `../../scripts/projectionFunctions.sh` +- Use `../../scripts/visualization/visualizeQueryResults.sh` for CSV → DOT → SVG conversion +- Output structure: + - `Java_Artifact/Graph_Visualizations/JavaArtifactBuildLevels.{csv,dot,svg}` + - `Java_Artifact/Graph_Visualizations/JavaArtifactLongestPathsIsolated.{csv,dot,svg}` + - `Java_Artifact/Graph_Visualizations/JavaArtifactLongestPaths.{csv,dot,svg}` + - `Typescript_Module/Graph_Visualizations/TypeScriptModuleBuildLevels.{csv,dot,svg}` + - `Typescript_Module/Graph_Visualizations/TypeScriptModuleLongestPathsIsolated.{csv,dot,svg}` + - `Typescript_Module/Graph_Visualizations/TypeScriptModuleLongestPaths.{csv,dot,svg}` + - `NPM_NonDevPackage/Graph_Visualizations/NpmPackageBuildLevels.{csv,dot,svg}` + - `NPM_NonDevPackage/Graph_Visualizations/NpmNonDevPackageLongestPathsIsolated.{csv,dot,svg}` + - `NPM_NonDevPackage/Graph_Visualizations/NpmNonDevPackageLongestPaths.{csv,dot,svg}` + - `NPM_DevPackage/Graph_Visualizations/NpmDevPackageLongestPathsIsolated.{csv,dot,svg}` + - `NPM_DevPackage/Graph_Visualizations/NpmDevPackageLongestPaths.{csv,dot,svg}` +- For each visualization: create projection, ensure topological sort exists for level info, execute graphviz query, run visualizeQueryResults.sh + +#### Phase 6: Markdown Summary + +**Design principle**: The Jupyter notebooks contain valuable explanatory descriptions that **must** be distilled into the Markdown summary. Rewrite them in a concise, improved, and summarized way optimized for both human readability and LLM consumption. Every linked table and chart in the report should have a short description (1–3 sentences) explaining what it shows and how to interpret it. Algorithm explanations should be condensed from the multi-paragraph notebook prose into focused paragraphs. + +**6.1** Create `summary/report.template.md`: +- YAML front matter (title, date, model version) + +- **Section 1: Executive Overview** — total artifacts, packages, modules, key structural observations + +- **Section 2: Cyclic Dependencies** + - Introductory paragraph: Explain what cyclic dependencies are, why they complicate builds and maintenance, and the resolution strategy. Condense from the notebook: *"A cycle group is a set of packages with mutual dependencies. The `forwardToBackwardBalance` metric indicates which dependencies, when reversed, would most effectively dissolve the cycle — values near 1.0 mean mostly forward dependencies (few backward dependencies to remove)."* + - Java package/artifact cycle tables (Table 2a, 2b, 2c pattern) with short per-table descriptions: + - **Table 2a (Overview)**: "Lists cycle groups with `numberForward` (dependencies in cycle direction) and `numberBackward` (dependencies against cycle direction). Sorted by `forwardToBackwardBalance` descending — groups at the top are easiest to fix." + - **Table 2b (Breakdown)**: "Expands each cycle group into individual dependencies in `type1 → type2` format, forward and backward." + - **Table 2c (Backward Only)**: "Shows only backward dependencies — the most promising candidates for removal or reversal to break cycles." + - TypeScript module cycles (same table pattern) + +- **Section 3: Java Internal Structure** + - **Artifact listings**: Short description: "Java artifacts sorted by their number of packages, types, and incoming/outgoing dependencies. Reveals the largest and most connected components." + - **Interface Segregation Principle candidates**: Condense from notebook: *"Based on Robert C. Martin's Interface Segregation Principle — 'Clients should not be forced to depend upon interfaces that they do not use.' This table shows packages where dependent packages use only a small fraction of the available types, indicating potential for splitting."* Short description for the linked table: "Packages sorted by the ratio of types actually used vs. types available." + - **Widely used types**: "Types used by the highest number of different packages. These are cross-cutting concerns or core abstractions." + - **Package usage by dependent artifacts**: "Packages where dependent artifacts use only a few of the available packages, indicating loose coupling at the artifact level." + - **Class usage across artifacts**: "Classes used by multiple artifacts — candidates for extraction into a shared library." + - **Duplicate package names**: Condense from notebook: *"Duplicate package names across artifacts can cause class loader conflicts and break package-protected access assumptions."* + - **Annotated elements**: "Code elements with annotations — reveals framework coupling and configuration density." + - **File distance distribution**: Condense from notebook: *"Intuitively, the distance is the fewest number of `cd` (change directory) commands needed to navigate between a source file and the dependency it uses. Aggregated to show how many dependencies are co-located (distance 0), one directory apart (distance 1), and so on."* + +- **Section 4: TypeScript Internal Structure** + - **Module listings**: "TypeScript modules sorted by their number of elements, incoming/outgoing dependencies." + - **Widely used elements**: "Elements used by the highest number of different modules." + - **Module element usage**: "Modules where dependents use only a few of the available elements, indicating over-broad module interfaces." + - **File distance distribution**: Same explanation as Java (adapted for modules) + +- **Section 5: Path Finding** + - Introductory paragraph: Condense the notebook's algorithm overview into a focused explanation: *"Path finding algorithms reveal the depth and complexity of dependency chains. **All Pairs Shortest Path** computes the minimum distance between every pair of connected nodes — distance 1 = direct dependency, distance 2 = one intermediary, etc. The **Graph Diameter** (longest shortest path) is a complexity metric: a diameter of 6 means at least one pair of modules requires a chain of 6 dependencies to connect. The **Longest Path** (for directed acyclic graphs) shows the worst-case dependency chain — relevant for build ordering since an artifact can only be built after everything it depends on."* + - Per-abstraction-level subsections (Java Package, Java Artifact, TypeScript Module, NPM packages): + - **All pairs shortest path**: "Distribution of shortest path distances. A peak at distance 1 indicates many direct dependencies; a long tail suggests deep transitive chains." + - Total distribution (bar + pie chart with descriptions: "Bar chart showing path count per distance" / "Pie chart showing percentage of paths per distance") + - Per-artifact/project distribution: Short description: *"Stacked bar charts (absolute and normalized) showing how shortest path distances distribute across artifacts/projects. Filtering is applied to pairs within the same artifact; however, intermediate nodes may cross artifact boundaries, reflecting real-world transitive dependencies."* + - Graph diameter per artifact/project: "Top artifacts/projects ranked by their graph diameter (longest shortest path)." + - **Longest path**: "The longest dependency chain for directed acyclic graphs. Higher values indicate deeper dependency hierarchies and greater build complexity." + - Note: *"Requires a Directed Acyclic Graph (DAG). Results may be inaccurate when cycles exist."* + - Total distribution (bar + pie chart) + - Per-artifact/project distribution (stacked bar charts) + - Max longest path per artifact/project + +- **Section 6: Topological Sort** — build order and build levels per abstraction level. Short description: "Build levels derived from topological ordering of the dependency graph. Level 0 nodes have no dependencies; higher levels depend on lower ones." + +- **Section 7: Graph Visualizations** — build level and longest path SVG references. Per visualization: 1-sentence description of what the graph shows (e.g., "Directed graph of Java artifacts colored by build level, showing the build dependency hierarchy.") + +- **Section 8: Glossary & Column Definitions** + - `forwardToBackwardBalance`: "Ratio indicating how many dependencies in a cycle group flow forward vs. backward. Values near 1.0 = mostly forward (easy to fix); near 0.0 = mostly backward." + - `numberForward` / `numberBackward`: "Count of dependencies flowing in cycle direction / against it." + - `Graph Diameter`: "The longest shortest path across all pairs — a measure of dependency depth and structural complexity." + - `Longest Path`: "Maximum-length directed path in a DAG — the worst-case dependency chain." + - `File Distance`: "Minimum number of directory traversals between a source file and its dependency." + - `Build Level`: "Topological sort level. Level 0 = no dependencies, level N = depends on nodes at levels < N." + +**6.2** Create `summary/internalDependenciesSummary.sh`: +- Follow [externalDependenciesSummary.sh](../../domains/external-dependencies/summary/externalDependenciesSummary.sh) pattern +- Execute summary-specific Cypher queries → Markdown table includes +- Conditionally include SVG chart references if files exist +- Conditionally include graph visualization SVG references +- Generate front matter (title, date, git tag) +- Use `scripts/markdown/embedMarkdownIncludes.sh` for template assembly +- Assemble final `internal_dependencies_report.md` + +**6.3** Create `internalDependenciesMarkdown.sh`: +- Thin delegator to `summary/internalDependenciesSummary.sh` + +#### Phase 7: Exploration Notebooks + +**7.1** Copy [jupyter/InternalDependenciesJava.ipynb](../../jupyter/InternalDependenciesJava.ipynb) → `explore/InternalDependenciesJava.ipynb`, change metadata `"code_graph_analysis_pipeline_data_validation"` from `"ValidateJavaInternalDependencies"` to `"ValidateAlwaysFalse"`. + +**7.2** Copy [jupyter/InternalDependenciesTypescript.ipynb](../../jupyter/InternalDependenciesTypescript.ipynb) → `explore/InternalDependenciesTypescript.ipynb`, change metadata `"code_graph_analysis_pipeline_data_validation"` from `"ValidateTypescriptModuleDependencies"` to `"ValidateAlwaysFalse"`. + +**7.3** Copy [jupyter/PathFindingJava.ipynb](../../jupyter/PathFindingJava.ipynb) → `explore/PathFindingJava.ipynb`, change metadata `"code_graph_analysis_pipeline_data_validation"` from `"ValidateJavaPackageDependencies"` to `"ValidateAlwaysFalse"`. + +**7.4** Copy [jupyter/PathFindingTypescript.ipynb](../../jupyter/PathFindingTypescript.ipynb) → `explore/PathFindingTypescript.ipynb`, change metadata `"code_graph_analysis_pipeline_data_validation"` from `"ValidateTypescriptModuleDependencies"` to `"ValidateAlwaysFalse"`. + +--- + +### Relevant Files + +**To create** (in `domains/internal-dependencies/`): +- `README.md`, `PREREQUISITES.md`, `COPIED_FILES.md` +- 4 entry point `.sh` files +- `pathFindingCharts.py` +- `graphs/internalDependenciesGraphs.sh` +- `summary/internalDependenciesSummary.sh`, `summary/report.template.md` + +**To copy** (43 `.cypher` + 4 `.ipynb`): +- [cypher/Internal_Dependencies/](../../cypher/Internal_Dependencies/) (14 files) → `queries/internal-dependencies/` +- [cypher/Cyclic_Dependencies/](../../cypher/Cyclic_Dependencies/) (7 of 9 files) → `queries/cyclic-dependencies/` +- [cypher/Path_Finding/](../../cypher/Path_Finding/) (15 files) → `queries/path-finding/` +- [cypher/Topological_Sort/](../../cypher/Topological_Sort/) (5 files) → `queries/topological-sort/` +- [cypher/Artifact_Dependencies/Artifacts_with_duplicate_packages.cypher](../../cypher/Artifact_Dependencies/Artifacts_with_duplicate_packages.cypher) → `queries/exploration/` +- [cypher/Java/Annotated_code_elements.cypher](../../cypher/Java/Annotated_code_elements.cypher) → `queries/exploration/` +- [jupyter/InternalDependenciesJava.ipynb](../../jupyter/InternalDependenciesJava.ipynb) → `explore/` +- [jupyter/InternalDependenciesTypescript.ipynb](../../jupyter/InternalDependenciesTypescript.ipynb) → `explore/` +- [jupyter/PathFindingJava.ipynb](../../jupyter/PathFindingJava.ipynb) → `explore/` +- [jupyter/PathFindingTypescript.ipynb](../../jupyter/PathFindingTypescript.ipynb) → `explore/` + +**Reference (read-only)**: +- [scripts/executeQueryFunctions.sh](../../scripts/executeQueryFunctions.sh) — `execute_cypher()`, `extractQueryParameter()`, `execute_cypher_queries_until_results()` +- [scripts/projectionFunctions.sh](../../scripts/projectionFunctions.sh) — `createDirectedDependencyProjection`, `createDirectedJavaTypeDependencyProjection` +- [scripts/cleanupAfterReportGeneration.sh](../../scripts/cleanupAfterReportGeneration.sh) +- [scripts/visualization/visualizeQueryResults.sh](../../scripts/visualization/visualizeQueryResults.sh) — CSV → GraphViz DOT → SVG +- [scripts/markdown/embedMarkdownIncludes.sh](../../scripts/markdown/embedMarkdownIncludes.sh) — template `` expansion +- [scripts/reports/InternalDependenciesCsv.sh](../../scripts/reports/InternalDependenciesCsv.sh) — query ordering reference +- [scripts/reports/PathFindingCsv.sh](../../scripts/reports/PathFindingCsv.sh) — path finding logic reference +- [scripts/reports/TopologicalSortCsv.sh](../../scripts/reports/TopologicalSortCsv.sh) — topological sort logic reference +- [scripts/reports/InternalDependenciesVisualization.sh](../../scripts/reports/InternalDependenciesVisualization.sh) — build level visualization reference +- [scripts/reports/PathFindingVisualization.sh](../../scripts/reports/PathFindingVisualization.sh) — longest path visualization reference +- [domains/external-dependencies/externalDependencyCharts.py](../../domains/external-dependencies/externalDependencyCharts.py) — `Parameters`, Neo4j query pattern +- [domains/external-dependencies/externalDependenciesCsv.sh](../../domains/external-dependencies/externalDependenciesCsv.sh) — domain CSV boilerplate +- [domains/external-dependencies/summary/externalDependenciesSummary.sh](../../domains/external-dependencies/summary/externalDependenciesSummary.sh) — template assembly reference +- [domains/anomaly-detection/anomalyDetectionCsv.sh](../../domains/anomaly-detection/anomalyDetectionCsv.sh) — subdirectory creation pattern +- [domains/anomaly-detection/anomalyDetectionVisualization.sh](../../domains/anomaly-detection/anomalyDetectionVisualization.sh) — visualization delegation pattern +- [domains/anomaly-detection/anomalyDetectionMarkdown.sh](../../domains/anomaly-detection/anomalyDetectionMarkdown.sh) — markdown delegation pattern + +### Verification + +1. **Structure**: Domain contains 4 `.sh` entry points, 1 `.py`, 43 `.cypher` in queries/, 4 `.ipynb` in explore/, `graphs/` with `.sh`, `summary/` with `.sh` and `.md` +2. **Shell lint**: `shellcheck domains/internal-dependencies/*.sh domains/internal-dependencies/graphs/*.sh domains/internal-dependencies/summary/*.sh` +3. **Python lint**: `python -m py_compile domains/internal-dependencies/pathFindingCharts.py` +4. **Pipeline discovery**: `find domains/ -name "*Csv.sh"`, `*Python.sh`, `*Visualization.sh`, `*Markdown.sh` all return the new domain's scripts +5. **Notebook metadata**: All 4 explore/ notebooks contain `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"` +6. **Cypher count**: `find domains/internal-dependencies/queries/ -name "*.cypher" | wc -l` = 43 +7. **No external changes**: No modifications to files outside `domains/internal-dependencies/` +8. **README completeness**: Documents prerequisites, entry points, folder structure, produced outputs +9. **COPIED_FILES.md**: Lists every original → copy mapping for deprecation tracking + +### Scope Boundaries + +**Included**: 43 Cypher queries (14 internal deps + 7 cyclic deps + 15 path finding + 5 topological sort + 2 exploration), ~27 SVG chart conversions, CSV/Visualization/Python/Markdown entry points, Markdown summary, exploration notebooks, prerequisites documentation, graph visualizations, copied files tracking + +**Excluded**: Moving/deleting originals, Dependencies_Projection queries (core infrastructure), projectionFunctions.sh (core infrastructure), changes to central pipeline scripts, `Cyclic_Dependencies_Concatenated.cypher`, `Cyclic_Dependencies_as_Nodes.cypher` (unreferenced) diff --git a/CHANGELOG.md b/CHANGELOG.md index 24381b015..1777cc3fa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -267,7 +267,7 @@ For all details see: https://github.com/JohT/code-graph-analysis-pipeline/releas * [External Dependencies](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/external-dependencies-java/ExternalDependenciesJava.md) contains detailed information about external library usage ([Notebook](./domains/external-dependencies/explore/ExternalDependenciesJava.ipynb)). * [Object Oriented Design Quality Metrics](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/object-oriented-design-metrics-java/ObjectOrientedDesignMetricsJava.md) is based on [OO Design Quality Metrics by Robert Martin](https://api.semanticscholar.org/CorpusID:18246616) ([Notebook](./jupyter/ObjectOrientedDesignMetricsJava.ipynb)). * [Overview](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/overview-java/OverviewJava.md) contains overall statistics and details about methods and their complexity. ([Notebook](./jupyter/OverviewJava.ipynb)). -* [Internal Dependencies](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-java/InternalDependenciesJava.md) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) and also includes cyclic dependencies ([Notebook](./jupyter/InternalDependenciesJava.ipynb)). +* [Internal Dependencies](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-java/InternalDependenciesJava.md) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) and also includes cyclic dependencies ([Notebook](./domains/internal-dependencies/explore/InternalDependenciesJava.ipynb)). * [Visibility Metrics](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/visibility-metrics-java/VisibilityMetricsJava.md) ([Notebook](./jupyter/VisibilityMetricsJava.ipynb)). * [Wordcloud](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/wordcloud/Wordcloud.md) contains a visual representation of package and class names ([Notebook](./jupyter/Wordcloud.ipynb)). @@ -284,6 +284,6 @@ Here are some reports that utilize Neo4j's [Graph Data Science Library](https:// * [External Dependencies (CSV)](./domains/external-dependencies/externalDependenciesCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/external-dependencies-csv/External_package_usage_overall.csv)) * [Object Oriented Design Metrics (CSV)](./scripts/reports/ObjectOrientedDesignMetricsCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/object-oriented-design-metrics-csv/MainSequenceAbstractnessInstabilityDistanceJava.csv)) * [Overview (CSV)](./scripts/reports/OverviewCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/overview-csv/Cyclomatic_Method_Complexity.csv)) -* [Internal Dependencies - Cyclic (CSV)](./scripts/reports/InternalDependenciesCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-csv/Cyclic_Dependencies_Breakdown_Backward_Only.csv)) -* [Internal Dependencies - Interface Segregation (CSV)](./scripts/reports/InternalDependenciesCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-csv/InterfaceSegregationCandidates.csv)) +* [Internal Dependencies - Cyclic (CSV)](./domains/internal-dependencies/internalDependenciesCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-csv/Cyclic_Dependencies_Breakdown_Backward_Only.csv)) +* [Internal Dependencies - Interface Segregation (CSV)](./domains/internal-dependencies/internalDependenciesCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-csv/InterfaceSegregationCandidates.csv)) * [Visibility Metrics (CSV)](./scripts/reports/VisibilityMetricsCsv.sh) ([Example](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/visibility-metrics-csv/RelativeVisibilityPerArtifact.csv)) diff --git a/README.md b/README.md index 0390eeb7a..fc44a134a 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ Here is an overview of [Jupyter Notebooks](https://jupyter.org) reports from [co - [External Dependencies](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/external-dependencies-java/ExternalDependenciesJava.md) contains detailed information about external library usage ([Notebook](./domains/external-dependencies/explore/ExternalDependenciesJava.ipynb)). - [Git History](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/git-history-general/GitHistoryGeneral.md) contains information about the git history of the analyzed code ([Notebook](./jupyter/GitHistoryGeneral.ipynb)). -- [Internal Dependencies](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-java/InternalDependenciesJava.md) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) and also includes cyclic dependencies ([Notebook](./jupyter/InternalDependenciesJava.ipynb)). +- [Internal Dependencies](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-java/InternalDependenciesJava.md) is based on [Analyze java package metrics in a graph database](https://joht.github.io/johtizen/data/2023/04/21/java-package-metrics-analysis.html) and also includes cyclic dependencies ([Notebook](./domains/internal-dependencies/explore/InternalDependenciesJava.ipynb)). - [Method Metrics](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/method-metrics-java/MethodMetricsJava.md) shows how the effective number of lines of code and the cyclomatic complexity are distributed across the methods in the code ([Notebook](./jupyter/MethodMetricsJava.ipynb)). - [Node Embeddings](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/node-embeddings-java/NodeEmbeddingsJava.md) shows how to generate node embeddings and to further reduce their dimensionality to be able to visualize them in a 2D plot ([Notebook](./jupyter/NodeEmbeddingsJava.ipynb)). - [Object Oriented Design Quality Metrics](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/object-oriented-design-metrics-java/ObjectOrientedDesignMetricsJava.md) is based on [OO Design Quality Metrics by Robert Martin](https://api.semanticscholar.org/CorpusID:18246616) ([Notebook](./jupyter/ObjectOrientedDesignMetricsJava.ipynb)). @@ -66,16 +66,16 @@ Here are some reports that utilize Neo4j's [Graph Data Science Library](https:// - [Centrality with Page Rank](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/centrality-csv/Package_Centrality_Page_Rank.csv) ([Source Script](./scripts/reports/CentralityCsv.sh)) - [Community Detection with Leiden](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/community-csv/Package_communityLeidenId_Community__Metrics.csv) ([Source Script](./scripts/reports/CommunityCsv.sh)) - [Node Embeddings with HashGNN](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/node-embeddings-csv/Package_Embeddings_HashGNN.csv) ([Source Script](./scripts/reports/NodeEmbeddingsCsv.sh)) -- [Path Finding with all pairs shortest path](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/path-finding-csv/Package_all_pairs_shortest_paths_distribution_per_project.csv) ([Source Script](./scripts/reports/PathFindingCsv.sh)) +- [Path Finding with all pairs shortest path](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/path-finding-csv/Package_all_pairs_shortest_paths_distribution_per_project.csv) ([Source Script](./domains/internal-dependencies/internalDependenciesCsv.sh)) - [Similarity with Jaccard](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/similarity-csv/Package_Similarity.csv) ([Source Script](./scripts/reports/SimilarityCsv.sh)) -- [Topology Sort](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/topology-csv/Package_Topological_Sort.csv) ([Source Script](./scripts/reports/TopologicalSortCsv.sh)) +- [Topology Sort](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/topology-csv/Package_Topological_Sort.csv) ([Source Script](./domains/internal-dependencies/internalDependenciesCsv.sh)) ### :art: Graph Visualization Here are some fully automated graph visualizations utilizing [GraphViz](https://graphviz.org)from [code-graph-analysis-examples](https://github.com/JohT/code-graph-analysis-examples): -- [Java Artifact Build Levels](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-visualization/JavaArtifactBuildLevels.svg) ([Query](./cypher/Internal_Dependencies/Java_Artifact_build_levels_for_graphviz.cypher), [Source Script](./scripts/visualization/visualizeQueryResults.sh)) -- [Java Artifact Longest Path Contributors](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/path-finding-visualization/JavaArtifactLongestPaths.svg) ([Query](./cypher/Path_Finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher), [Source Script](./scripts/visualization/visualizeQueryResults.sh)) +- [Java Artifact Build Levels](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/internal-dependencies-visualization/JavaArtifactBuildLevels.svg) ([Query](./domains/internal-dependencies/queries/internal-dependencies/Java_Artifact_build_levels_for_graphviz.cypher), [Source Script](./scripts/visualization/visualizeQueryResults.sh)) +- [Java Artifact Longest Path Contributors](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/path-finding-visualization/JavaArtifactLongestPaths.svg) ([Query](./domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher), [Source Script](./scripts/visualization/visualizeQueryResults.sh)) - [Java Package Top #1 Authority Archetype and contributing packages](https://github.com/JohT/code-graph-analysis-examples/blob/main/analysis-results/AxonFramework/latest/anomaly-detection/Java_Package/GraphVisualizations/TopAuthority1.svg) ([Query](./domains/anomaly-detection/labels/AnomalyDetectionArchetypeAuthority.cypher), [Source Script](./domains/anomaly-detection/graphs/anomalyDetectionGraphs.sh)) ## :book: Blog Articles diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Concatenated.cypher b/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Concatenated.cypher deleted file mode 100644 index 07a0e5a20..000000000 --- a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Concatenated.cypher +++ /dev/null @@ -1,11 +0,0 @@ -//Cyclic Dependencies Concatenated -MATCH (package:Package)-[:CONTAINS]->(forwardSource:Type)-[:DEPENDS_ON]->(forwardTarget:Type)<-[:CONTAINS]-(dependentPackage:Package) -MATCH (dependentPackage)-[:CONTAINS]->(backwardSource:Type)-[:DEPENDS_ON]->(backwardTarget:Type)<-[:CONTAINS]-(package) -WHERE package <> dependentPackage - WITH package, dependentPackage - ,collect(DISTINCT forwardSource) + collect(DISTINCT backwardTarget) + - collect(DISTINCT backwardSource) + collect(DISTINCT backwardTarget) - AS dependencies -UNWIND dependencies AS dependenciesUnwind -RETURN package, dependentPackage - ,collect(DISTINCT dependenciesUnwind) AS dependencies \ No newline at end of file diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_as_Nodes.cypher b/cypher/Cyclic_Dependencies/Cyclic_Dependencies_as_Nodes.cypher deleted file mode 100644 index da1945e8d..000000000 --- a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_as_Nodes.cypher +++ /dev/null @@ -1,7 +0,0 @@ -// Cyclic Dependencies -MATCH (package:Package)-[:CONTAINS]->(type:Type)-[:DEPENDS_ON]->(dependentType:Type)<-[:CONTAINS]-(dependentPackage:Package) -MATCH (dependentPackage)-[:CONTAINS]->(cycleType:Type)-[:DEPENDS_ON]->(cycleDependentType:Type)<-[:CONTAINS]-(package) -WHERE package <> dependentPackage -RETURN package, dependentPackage - ,type, dependentType, cycleType, cycleDependentType - LIMIT 100 \ No newline at end of file diff --git a/cypher/Dependencies_Projection/Dependencies_0_Check_Projection_Exists.cypher b/cypher/Dependencies_Projection/Dependencies_0_Check_Projection_Exists.cypher new file mode 100644 index 000000000..fc0c57f29 --- /dev/null +++ b/cypher/Dependencies_Projection/Dependencies_0_Check_Projection_Exists.cypher @@ -0,0 +1,4 @@ +// Check if the projection exists. Variables: dependencies_projection + +RETURN CASE WHEN gds.graph.exists($dependencies_projection + '-cleaned') THEN 1 + ELSE 0 END AS projectionCount \ No newline at end of file diff --git a/cypher/Internal_Dependencies/Candidates_for_Interface_Segregation.cypher b/cypher/Internal_Dependencies/Candidates_for_Interface_Segregation.cypher deleted file mode 100644 index 0fa970345..000000000 --- a/cypher/Internal_Dependencies/Candidates_for_Interface_Segregation.cypher +++ /dev/null @@ -1,27 +0,0 @@ -// Candidates for Interface Segregation - -MATCH (type:Type)-[:DECLARES]->(method:Method)-[:INVOKES]->(dependentMethod:Method) -MATCH (dependentMethod)<-[:DECLARES]-(dependentType:Type) -MATCH (dependentType)-[:IMPLEMENTS*1..9]->(superType:Type)-[:DECLARES]->(inheritedMethod:Method) -WHERE type.fqn <> dependentType.fqn - AND dependentMethod.name IS NOT NULL - AND inheritedMethod.name IS NOT NULL - AND dependentMethod.name <> '' // ignore constructors - AND inheritedMethod.name <> '' // ignore constructors - WITH type.fqn AS fullTypeName - ,dependentType.fqn AS fullDependentTypeName - ,collect(DISTINCT dependentMethod.name) AS calledMethodNames - ,count(DISTINCT dependentMethod) AS calledMethods - // Count the different signatures without the return type - // of all declared methods including the inherited ones - ,count(DISTINCT split(method.signature, ' ')[1]) + count(DISTINCT split(inheritedMethod.signature, ' ')[1]) AS declaredMethods -// Filter out types that declare only a few more methods than those that are actually used. -// A good interface segregation candidate declares a lot of methods where only a few of them are used widely. -WHERE declaredMethods > calledMethods + 2 - WITH fullDependentTypeName - ,declaredMethods - ,calledMethodNames - ,calledMethods - ,count(DISTINCT fullTypeName) AS callerTypes - RETURN fullDependentTypeName, declaredMethods, calledMethodNames, calledMethods, callerTypes - ORDER BY callerTypes DESC, declaredMethods DESC, fullDependentTypeName \ No newline at end of file diff --git a/cypher/Internal_Dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher b/cypher/Internal_Dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher deleted file mode 100644 index 2838ea1df..000000000 --- a/cypher/Internal_Dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher +++ /dev/null @@ -1,10 +0,0 @@ -// Get file distance distribution for dependencies (intuitively the fewest number of change directory commands needed) - - MATCH (source:File)-[dependency:DEPENDS_ON]->(target:File) - WHERE dependency.fileDistanceAsFewestChangeDirectoryCommands IS NOT NULL - RETURN dependency.fileDistanceAsFewestChangeDirectoryCommands - ,count(*) AS numberOfDependencies - ,count(DISTINCT source) AS numberOfDependencyUsers - ,count(DISTINCT target) AS numberOfDependencyProviders - ,collect(source.fileName + ' uses ' + target.fileName)[0..4] AS examples - ORDER BY dependency.fileDistanceAsFewestChangeDirectoryCommands \ No newline at end of file diff --git a/cypher/Internal_Dependencies/List_types_that_are_used_by_many_different_packages.cypher b/cypher/Internal_Dependencies/List_types_that_are_used_by_many_different_packages.cypher deleted file mode 100644 index 25c95565f..000000000 --- a/cypher/Internal_Dependencies/List_types_that_are_used_by_many_different_packages.cypher +++ /dev/null @@ -1,12 +0,0 @@ -// List types that are used by many different packages - -MATCH (artifact:Artifact)-[:CONTAINS]->(package:Package)-[:CONTAINS]->(type:Type)-[:DEPENDS_ON]->(dependentType:Type)<-[:CONTAINS]-(dependentPackage:Package)<-[:CONTAINS]-(dependentArtifact:Artifact) -WHERE package <> dependentPackage -WITH dependentType - ,labels(dependentType) AS dependentTypeLabels - ,COUNT(DISTINCT package.fqn) AS numberOfUsingPackages -RETURN dependentType.fqn AS fullQualifiedDependentTypeName - ,dependentType.name AS dependentTypeName - ,dependentTypeLabels - ,numberOfUsingPackages - ORDER BY numberOfUsingPackages DESC, dependentTypeName ASC \ No newline at end of file diff --git a/domains/internal-dependencies/COPIED_FILES.md b/domains/internal-dependencies/COPIED_FILES.md new file mode 100644 index 000000000..30ea7cb6f --- /dev/null +++ b/domains/internal-dependencies/COPIED_FILES.md @@ -0,0 +1,121 @@ +# Copied Files Tracking + +This document maps every original file that was copied into this domain to its copy location. +It exists to support a future deprecation follow-up task that will remove or migrate the originals +once this domain is the canonical implementation. + +> **Breaking change notice:** Output directories have changed. See the README for the new structure under `reports/internal-dependencies/`. +> When the old scripts in `scripts/reports/` are eventually removed, a **major version bump** is required. + +--- + +## Cypher Queries + +### Internal Dependencies (14 files) + +| Original | Copy | +|----------|------| +| `cypher/Internal_Dependencies/Candidates_for_Interface_Segregation.cypher` | `queries/internal-dependencies/Candidates_for_Interface_Segregation.cypher` | +| `cypher/Internal_Dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher` | `queries/internal-dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher` | +| `cypher/Internal_Dependencies/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher` | `queries/internal-dependencies/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher` | +| `cypher/Internal_Dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher` | `queries/internal-dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher` | +| `cypher/Internal_Dependencies/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher` | `queries/internal-dependencies/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher` | +| `cypher/Internal_Dependencies/Inter_scan_and_project_dependencies_of_Typescript_modules.cypher` | `queries/internal-dependencies/Inter_scan_and_project_dependencies_of_Typescript_modules.cypher` | +| `cypher/Internal_Dependencies/Java_Artifact_build_levels_for_graphviz.cypher` | `queries/internal-dependencies/Java_Artifact_build_levels_for_graphviz.cypher` | +| `cypher/Internal_Dependencies/List_all_Java_artifacts.cypher` | `queries/internal-dependencies/List_all_Java_artifacts.cypher` | +| `cypher/Internal_Dependencies/List_all_Typescript_modules.cypher` | `queries/internal-dependencies/List_all_Typescript_modules.cypher` | +| `cypher/Internal_Dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher` | `queries/internal-dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher` | +| `cypher/Internal_Dependencies/List_types_that_are_used_by_many_different_packages.cypher` | `queries/internal-dependencies/List_types_that_are_used_by_many_different_packages.cypher` | +| `cypher/Internal_Dependencies/NPM_Package_build_levels_for_graphviz.cypher` | `queries/internal-dependencies/NPM_Package_build_levels_for_graphviz.cypher` | +| `cypher/Internal_Dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher` | `queries/internal-dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher` | +| `cypher/Internal_Dependencies/Typescript_Module_build_levels_for_graphviz.cypher` | `queries/internal-dependencies/Typescript_Module_build_levels_for_graphviz.cypher` | + +### Cyclic Dependencies (7 of 9 files) + +Excluded: `Cyclic_Dependencies_Concatenated.cypher` and `Cyclic_Dependencies_as_Nodes.cypher` — unreferenced by any script or notebook. + +| Original | Copy | +|----------|------| +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies.cypher` | +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher` | +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher` | +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_Backward_Only.cypher` | +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher` | +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown.cypher` | +| `cypher/Cyclic_Dependencies/Cyclic_Dependencies_for_Typescript.cypher` | `queries/cyclic-dependencies/Cyclic_Dependencies_for_Typescript.cypher` | + +### Path Finding (15 files) + +| Original | Copy | +|----------|------| +| `cypher/Path_Finding/Path_Finding_1_Create_Projection.cypher` | `queries/path-finding/Path_Finding_1_Create_Projection.cypher` | +| `cypher/Path_Finding/Path_Finding_2_Estimate_Memory.cypher` | `queries/path-finding/Path_Finding_2_Estimate_Memory.cypher` | +| `cypher/Path_Finding/Path_Finding_3_Depth_First_Search_Path.cypher` | `queries/path-finding/Path_Finding_3_Depth_First_Search_Path.cypher` | +| `cypher/Path_Finding/Path_Finding_4_Breadth_First_Search_Path.cypher` | `queries/path-finding/Path_Finding_4_Breadth_First_Search_Path.cypher` | +| `cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher` | `queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher` | +| `cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher` | `queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher` | +| `cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_examples.cypher` | `queries/path-finding/Path_Finding_5_All_pairs_shortest_path_examples.cypher` | +| `cypher/Path_Finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher` | `queries/path-finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher` | +| `cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_overall.cypher` | `queries/path-finding/Path_Finding_6_Longest_paths_distribution_overall.cypher` | +| `cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_per_project.cypher` | `queries/path-finding/Path_Finding_6_Longest_paths_distribution_per_project.cypher` | +| `cypher/Path_Finding/Path_Finding_6_Longest_paths_examples.cypher` | `queries/path-finding/Path_Finding_6_Longest_paths_examples.cypher` | +| `cypher/Path_Finding/Path_Finding_6_Longest_paths_for_graphviz.cypher` | `queries/path-finding/Path_Finding_6_Longest_paths_for_graphviz.cypher` | +| `cypher/Path_Finding/Set_Parameters.cypher` | `queries/path-finding/Set_Parameters.cypher` | +| `cypher/Path_Finding/Set_Parameters_NonDevNpmPackage.cypher` | `queries/path-finding/Set_Parameters_NonDevNpmPackage.cypher` | +| `cypher/Path_Finding/Set_Parameters_Typescript_Module.cypher` | `queries/path-finding/Set_Parameters_Typescript_Module.cypher` | + +### Topological Sort (5 files) + +| Original | Copy | +|----------|------| +| `cypher/Topological_Sort/Set_Parameters.cypher` | `queries/topological-sort/Set_Parameters.cypher` | +| `cypher/Topological_Sort/Topological_Sort_Exists.cypher` | `queries/topological-sort/Topological_Sort_Exists.cypher` | +| `cypher/Topological_Sort/Topological_Sort_List.cypher` | `queries/topological-sort/Topological_Sort_List.cypher` | +| `cypher/Topological_Sort/Topological_Sort_Query.cypher` | `queries/topological-sort/Topological_Sort_Query.cypher` | +| `cypher/Topological_Sort/Topological_Sort_Write.cypher` | `queries/topological-sort/Topological_Sort_Write.cypher` | + +### Exploration Queries (2 files — not executed by CSV entry point) + +| Original | Copy | +|----------|------| +| `cypher/Artifact_Dependencies/Artifacts_with_duplicate_packages.cypher` | `queries/exploration/Artifacts_with_duplicate_packages.cypher` | +| `cypher/Java/Annotated_code_elements.cypher` | `queries/exploration/Annotated_code_elements.cypher` | + +--- + +## Jupyter Notebooks (4 files) + +| Original | Copy | Metadata Change | +|----------|------|-----------------| +| `jupyter/InternalDependenciesJava.ipynb` | `explore/InternalDependenciesJava.ipynb` | `ValidateJavaInternalDependencies` → `ValidateAlwaysFalse` | +| `jupyter/InternalDependenciesTypescript.ipynb` | `explore/InternalDependenciesTypescript.ipynb` | `ValidateTypescriptModuleDependencies` → `ValidateAlwaysFalse` | +| `jupyter/PathFindingJava.ipynb` | `explore/PathFindingJava.ipynb` | `ValidateJavaPackageDependencies` → `ValidateAlwaysFalse` | +| `jupyter/PathFindingTypescript.ipynb` | `explore/PathFindingTypescript.ipynb` | `ValidateTypescriptModuleDependencies` → `ValidateAlwaysFalse` | + +--- + +## Scripts Referenced but NOT Copied (Central Pipeline) + +These scripts are sourced from the central `scripts/` directory and are not duplicated: + +| Script | Used By | +|--------|---------| +| `scripts/executeQueryFunctions.sh` | All entry point scripts | +| `scripts/projectionFunctions.sh` | CSV, Visualization entry points | +| `scripts/cleanupAfterReportGeneration.sh` | CSV, Python, Visualization, Markdown scripts | +| `scripts/visualization/visualizeQueryResults.sh` | Graph visualization script | +| `scripts/markdown/embedMarkdownIncludes.sh` | Markdown summary script | + +--- + +## Old Scripts to Deprecate (Follow-up Task) + +Once this domain is the canonical implementation, the following scripts in `scripts/reports/` can be deprecated: + +| Old Script | Replacement | +|-----------|-------------| +| `scripts/reports/InternalDependenciesCsv.sh` | `domains/internal-dependencies/internalDependenciesCsv.sh` | +| `scripts/reports/PathFindingCsv.sh` | `domains/internal-dependencies/internalDependenciesCsv.sh` | +| `scripts/reports/TopologicalSortCsv.sh` | `domains/internal-dependencies/internalDependenciesCsv.sh` | +| `scripts/reports/InternalDependenciesVisualization.sh` | `domains/internal-dependencies/graphs/internalDependenciesGraphs.sh` | +| `scripts/reports/PathFindingVisualization.sh` | `domains/internal-dependencies/graphs/internalDependenciesGraphs.sh` | diff --git a/domains/internal-dependencies/PREREQUISITES.md b/domains/internal-dependencies/PREREQUISITES.md new file mode 100644 index 000000000..a5a22a0da --- /dev/null +++ b/domains/internal-dependencies/PREREQUISITES.md @@ -0,0 +1,117 @@ +# Internal Dependencies Domain — Prerequisites + +The following are provided by the central pipeline and must run **before** this domain executes. +They are not copied into this domain; they are sourced or referenced from the central pipeline locations. + +--- + +## 1. Neo4j Running with Scanned Artifacts + +Neo4j must be running and all artifacts must have been scanned and loaded into the graph database +before any script in this domain is executed. + +See the main [README.md](../../README.md) and [GETTING_STARTED.md](../../GETTING_STARTED.md) for setup instructions. + +--- + +## 2. DEPENDS_ON Relationships between Types + +The graph must contain `DEPENDS_ON` relationships between `Type`, `Package`, and `Artifact` nodes. +These are created by the jQAssistant scan step. + +--- + +## 3. Type Labels + +The following type classification labels must exist on the relevant nodes: + +| Label | Purpose | +|-------|---------| +| `PrimitiveType` | Java primitives (int, boolean, …) | +| `Void` | Java void return type | +| `JavaType` | Resolved Java type (class, interface, enum) | +| `ResolvedDuplicateType` | Duplicate type resolved across artifacts | + +**Cypher source:** [`cypher/Types/`](../../cypher/Types/) + +--- + +## 4. Weight Properties on DEPENDS_ON Relationships + +The following weight properties must exist on `DEPENDS_ON` relationships: + +| Property | Description | +|----------|-------------| +| `weight` | Total count of dependencies | +| `weightInterfaces` | Dependency weight counting only interface types | +| `weight25PercentInterfaces` | Blended weight: 75% class + 25% interface weight | + +**Cypher source:** [`cypher/DependsOn_Relationship_Weights/`](../../cypher/DependsOn_Relationship_Weights/) + +--- + +## 5. Dependencies Projection + +The Graph Data Science (GDS) library projection functions must be available. +Key functions used by path finding and topological sort: + +- `createDirectedDependencyProjection` — creates a directed in-memory graph projection for a given node label and weight property +- `createDirectedJavaTypeDependencyProjection` — specialized projection for Java `Type` nodes +- `deleteDirectedDependencyProjection` — removes the projection after use + +**Cypher source:** [`cypher/Dependencies_Projection/`](../../cypher/Dependencies_Projection/) + +> **Note:** A follow-up task is planned to rethink the placement of core dependency Cypher files within the pipeline. + +--- + +## 6. Projection Functions Shell Script + +The shell functions wrapping the Dependencies Projection Cypher queries are provided by: + +``` +scripts/projectionFunctions.sh +``` + +This script is sourced directly from `../../scripts/projectionFunctions.sh` by the domain entry point scripts. + +--- + +## 7. TypeScript Enrichment + +For TypeScript analyses, the following enrichment must have been applied: + +| Enrichment | Description | +|-----------|-------------| +| `namespace` property | Module namespace | +| `moduleName` property | Module name | +| `isNodeModule` property | Whether the module is a Node.js built-in | +| `isExternalImport` property | Whether the import is external | +| `IS_IMPLEMENTED_IN` relationships | Links TypeScript declarations to source modules | +| `DEPENDS_ON` between modules | Propagated from resolved imports | +| `PROVIDED_BY_NPM_DEPENDENCY` links | Links modules to npm package dependencies | +| `lowCouplingElement25PercentWeight` | Dependency weight for TypeScript modules | + +**Cypher source:** [`cypher/Typescript_Enrichment/`](../../cypher/Typescript_Enrichment/) + +--- + +## 8. General Enrichment + +The following properties must exist on `File` nodes, required for file distance calculations: + +| Property | Description | +|----------|-------------| +| `name` | Name of the file | +| `extension` | File extension | + +**Cypher source:** [`cypher/General_Enrichment/`](../../cypher/General_Enrichment/) + +--- + +## 9. Metrics (Indirect) + +Dependency degree calculations (incoming/outgoing `DEPENDS_ON` counts) are used indirectly +by several internal dependency queries. + +**Cypher source:** [`cypher/Metrics/`](../../cypher/Metrics/) diff --git a/domains/internal-dependencies/README.md b/domains/internal-dependencies/README.md new file mode 100644 index 000000000..6f4515130 --- /dev/null +++ b/domains/internal-dependencies/README.md @@ -0,0 +1,155 @@ +# Internal Dependencies Domain + +This directory contains the implementation and resources for analysing **internal dependencies** within the Code Graph Analysis Pipeline. It follows the vertical-slice domain pattern: all Cypher queries, Python chart scripts, shell scripts, and report templates needed for this analysis live here. + +This domain covers four related analysis areas: + +- **Internal Dependencies**: How packages, artifacts, and TypeScript modules depend on each other — interface segregation, widely used types, usage ratios, and file distances. +- **Cyclic Dependencies**: Mutual dependency cycles between Java packages, Java artifacts, and TypeScript modules — with metrics to prioritise which backward dependencies to remove. +- **Path Finding**: All-pairs shortest path and longest path algorithms — revealing dependency depth, graph diameter, and worst-case transitive chains. +- **Topological Sort**: Build ordering across all abstraction levels — packages, artifacts, types, modules, and NPM packages. + +## Entry Points + +The following scripts are discovered and invoked automatically by the central compilation scripts in [scripts/reports/compilations/](../../scripts/reports/compilations/). They are found by filename pattern. + +- [internalDependenciesCsv.sh](./internalDependenciesCsv.sh): Entry point for CSV reports based on Cypher queries. Discovered by `CsvReports.sh` (`*Csv.sh` pattern). +- [internalDependenciesPython.sh](./internalDependenciesPython.sh): Entry point for Python-based SVG chart generation. Discovered by `PythonReports.sh` (`*Python.sh` pattern). +- [internalDependenciesVisualization.sh](./internalDependenciesVisualization.sh): Entry point for graph visualizations. Discovered by `VisualizationReports.sh` (`*Visualization.sh` pattern). +- [internalDependenciesMarkdown.sh](./internalDependenciesMarkdown.sh): Entry point for the Markdown summary report. Discovered by `MarkdownReports.sh` (`*Markdown.sh` pattern). + +## Folder Structure + +``` +domains/internal-dependencies/ +├── README.md # This file +├── PREREQUISITES.md # Detailed prerequisite documentation +├── COPIED_FILES.md # Original → copy mapping for deprecation follow-up +├── internalDependenciesCsv.sh # Entry point: CSV reports +├── internalDependenciesPython.sh # Entry point: Python charts +├── internalDependenciesVisualization.sh # Entry point: Graph visualizations +├── internalDependenciesMarkdown.sh # Entry point: Markdown summary +├── pathFindingCharts.py # Chart generator: path finding bar + pie SVGs +├── explore/ # Jupyter notebooks for interactive exploration +│ ├── InternalDependenciesJava.ipynb +│ ├── InternalDependenciesTypescript.ipynb +│ ├── PathFindingJava.ipynb +│ └── PathFindingTypescript.ipynb +├── queries/ +│ ├── internal-dependencies/ # 14 Cypher queries (internal structure) +│ ├── cyclic-dependencies/ # 7 Cypher queries (cycle analysis) +│ ├── path-finding/ # 15 Cypher queries (path algorithms) +│ ├── topological-sort/ # 5 Cypher queries (build ordering) +│ └── exploration/ # 2 Cypher queries (explore notebooks only) +├── graphs/ +│ └── internalDependenciesGraphs.sh # Graph visualization orchestration +└── summary/ + ├── internalDependenciesSummary.sh # Markdown assembly logic + └── report.template.md # Main report template +``` + +## Prerequisites + +This domain requires the following to be in place before running. These are provided by the central pipeline and are **not** set up by this domain. See [PREREQUISITES.md](./PREREQUISITES.md) for full details. + +- Neo4j running with scanned artifacts loaded +- `DEPENDS_ON` relationships between `Type`, `Package`, and `Artifact` nodes +- Type labels (`PrimitiveType`, `Void`, `JavaType`, `ResolvedDuplicateType`) from [`cypher/Types/`](../../cypher/Types/) +- Weight properties (`weight`, `weightInterfaces`, `weight25PercentInterfaces`) from [`cypher/DependsOn_Relationship_Weights/`](../../cypher/DependsOn_Relationship_Weights/) +- Dependencies Projection functions from [`cypher/Dependencies_Projection/`](../../cypher/Dependencies_Projection/) and [`scripts/projectionFunctions.sh`](../../scripts/projectionFunctions.sh) +- TypeScript enrichment from [`cypher/Typescript_Enrichment/`](../../cypher/Typescript_Enrichment/) +- General enrichment (`name`, `extension` on `File` nodes) from [`cypher/General_Enrichment/`](../../cypher/General_Enrichment/) + +## Execution Order + +1. **`internalDependenciesCsv.sh`** — runs Cypher queries, writes CSV files +2. **`internalDependenciesPython.sh`** — reads CSV data, generates SVG charts +3. **`internalDependenciesVisualization.sh`** — generates GraphViz DOT → SVG graph visualizations +4. **`internalDependenciesMarkdown.sh`** — assembles the final Markdown report + +## What This Domain Produces + +All output goes into `reports/internal-dependencies/`, organised by abstraction level: + +``` +reports/internal-dependencies/ +├── Distance_distribution_between_dependent_files.csv +├── Java_Artifact/ +│ ├── CyclicArtifactDependenciesUnwinded.csv +│ ├── List_all_Java_artifacts.csv +│ ├── ArtifactPackageUsage.csv +│ ├── ClassesPerPackageUsageAcrossArtifacts.csv +│ ├── Artifact_all_pairs_shortest_paths_distribution_per_project.csv +│ ├── Artifact_longest_paths_distribution.csv +│ ├── Artifact_Topological_Sort.csv +│ └── Graph_Visualizations/ +│ ├── JavaArtifactBuildLevels.{csv,dot,svg} +│ ├── JavaArtifactLongestPathsIsolated.{csv,dot,svg} +│ └── JavaArtifactLongestPaths.{csv,dot,svg} +├── Java_Package/ +│ ├── Cyclic_Dependencies.csv +│ ├── Cyclic_Dependencies_Breakdown.csv +│ ├── Cyclic_Dependencies_Breakdown_Backward_Only.csv +│ ├── InterfaceSegregationCandidates.csv +│ ├── WidelyUsedTypes.csv +│ ├── Package_all_pairs_shortest_paths_distribution_per_project.csv +│ ├── Package_longest_paths_distribution.csv +│ └── Package_Topological_Sort.csv +├── Java_Type/ +│ └── Type_Topological_Sort.csv +├── Typescript_Module/ +│ ├── Cyclic_Dependencies_for_Typescript.csv +│ ├── Cyclic_Dependencies_Breakdown_for_Typescript.csv +│ ├── Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.csv +│ ├── List_all_Typescript_modules.csv +│ ├── WidelyUsedTypescriptElements.csv +│ ├── ModuleElementsUsageTypescript.csv +│ ├── Module_all_pairs_shortest_paths_distribution_per_project.csv +│ ├── Module_longest_paths_distribution.csv +│ ├── Module_Topological_Sort.csv +│ └── Graph_Visualizations/ +│ ├── TypeScriptModuleBuildLevels.{csv,dot,svg} +│ ├── TypeScriptModuleLongestPathsIsolated.{csv,dot,svg} +│ └── TypeScriptModuleLongestPaths.{csv,dot,svg} +├── NPM_NonDevPackage/ +│ ├── NpmNonDevPackage_all_pairs_shortest_paths_distribution_per_project.csv +│ ├── NpmNonDevPackage_longest_paths_distribution.csv +│ ├── NpmNonDevPackage_Topological_Sort.csv +│ └── Graph_Visualizations/ +│ ├── NpmPackageBuildLevels.{csv,dot,svg} +│ ├── NpmNonDevPackageLongestPathsIsolated.{csv,dot,svg} +│ └── NpmNonDevPackageLongestPaths.{csv,dot,svg} +└── NPM_DevPackage/ + ├── NpmDevPackage_all_pairs_shortest_paths_distribution_per_project.csv + ├── NpmDevPackage_longest_paths_distribution.csv + ├── NpmDevPackage_Topological_Sort.csv + └── Graph_Visualizations/ + ├── NpmDevPackageLongestPathsIsolated.{csv,dot,svg} + └── NpmDevPackageLongestPaths.{csv,dot,svg} +``` + +### SVG Charts (`reports/internal-dependencies/`) + +Python-generated charts from [pathFindingCharts.py](./pathFindingCharts.py): + +- **Java Package**: all-pairs shortest path (bar, pie, stacked log, stacked normalised, diameter) + longest path (bar, pie, stacked log, stacked normalised, max per artifact) +- **Java Artifact**: all-pairs shortest path (bar, pie) + longest path (bar, pie) +- **TypeScript Module**: all-pairs shortest path and longest path charts (same set as Java Package) +- **NPM packages**: same chart pattern where data exists + +### Markdown Summary (`reports/internal-dependencies/internal_dependencies_report.md`) + +A structured report covering cyclic dependencies, internal structure analysis, path finding insights, topological build levels, graph visualizations, and a glossary. + +## Breaking Change Note + +This domain uses a **new output directory** (`reports/internal-dependencies/`) consolidating what was previously split across: + +- `reports/internal-dependencies-csv/` +- `reports/path-finding-csv/` +- `reports/topology-csv/` +- `reports/internal-dependencies-visualization/` +- `reports/path-finding-visualization/` + +When the old scripts in `scripts/reports/` are eventually removed, a **major version bump** is required. +See [COPIED_FILES.md](./COPIED_FILES.md) for the full deprecation tracking. diff --git a/jupyter/InternalDependenciesJava.ipynb b/domains/internal-dependencies/explore/InternalDependenciesJava.ipynb similarity index 99% rename from jupyter/InternalDependenciesJava.ipynb rename to domains/internal-dependencies/explore/InternalDependenciesJava.ipynb index 83a21b3b3..97c140ab1 100644 --- a/jupyter/InternalDependenciesJava.ipynb +++ b/domains/internal-dependencies/explore/InternalDependenciesJava.ipynb @@ -614,7 +614,7 @@ "name": "JohT" } ], - "code_graph_analysis_pipeline_data_validation": "ValidateJavaInternalDependencies", + "code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse", "kernelspec": { "display_name": "codegraph", "language": "python", @@ -636,4 +636,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/jupyter/InternalDependenciesTypescript.ipynb b/domains/internal-dependencies/explore/InternalDependenciesTypescript.ipynb similarity index 99% rename from jupyter/InternalDependenciesTypescript.ipynb rename to domains/internal-dependencies/explore/InternalDependenciesTypescript.ipynb index 879ca767f..b8bf1d72b 100644 --- a/jupyter/InternalDependenciesTypescript.ipynb +++ b/domains/internal-dependencies/explore/InternalDependenciesTypescript.ipynb @@ -456,7 +456,7 @@ "name": "JohT" } ], - "code_graph_analysis_pipeline_data_validation": "ValidateTypescriptModuleDependencies", + "code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse", "kernelspec": { "display_name": "codegraph", "language": "python", @@ -478,4 +478,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/jupyter/PathFindingJava.ipynb b/domains/internal-dependencies/explore/PathFindingJava.ipynb similarity index 99% rename from jupyter/PathFindingJava.ipynb rename to domains/internal-dependencies/explore/PathFindingJava.ipynb index 39db42d93..ce8275721 100644 --- a/jupyter/PathFindingJava.ipynb +++ b/domains/internal-dependencies/explore/PathFindingJava.ipynb @@ -1532,7 +1532,7 @@ "name": "JohT" } ], - "code_graph_analysis_pipeline_data_validation": "ValidateJavaPackageDependencies", + "code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse", "kernelspec": { "display_name": "codegraph", "language": "python", @@ -1554,4 +1554,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/jupyter/PathFindingTypescript.ipynb b/domains/internal-dependencies/explore/PathFindingTypescript.ipynb similarity index 99% rename from jupyter/PathFindingTypescript.ipynb rename to domains/internal-dependencies/explore/PathFindingTypescript.ipynb index 1006d66bc..59bdf77fe 100644 --- a/jupyter/PathFindingTypescript.ipynb +++ b/domains/internal-dependencies/explore/PathFindingTypescript.ipynb @@ -1567,7 +1567,7 @@ "name": "JohT" } ], - "code_graph_analysis_pipeline_data_validation": "ValidateTypescriptModuleDependencies", + "code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse", "kernelspec": { "display_name": "codegraph", "language": "python", @@ -1589,4 +1589,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/domains/internal-dependencies/graphs/internalDependenciesGraphs.sh b/domains/internal-dependencies/graphs/internalDependenciesGraphs.sh new file mode 100755 index 000000000..aa35db142 --- /dev/null +++ b/domains/internal-dependencies/graphs/internalDependenciesGraphs.sh @@ -0,0 +1,204 @@ +#!/usr/bin/env bash + +# Executes internal dependency and path finding Cypher queries for GraphViz visualization. +# Visualizes Java Artifact, TypeScript Module, and NPM Package dependencies with build levels +# and longest paths. +# +# Build level graphs use the topological sort level to colour nodes (showing dependency hierarchy). +# Longest path graphs highlight the worst-case dependency chains. +# +# The reports (csv, dot and svg files) will be written into +# reports/internal-dependencies/{abstraction_level}/Graph_Visualizations/ + +# Requires executeQueryFunctions.sh, projectionFunctions.sh, visualizeQueryResults.sh, +# cleanupAfterReportGeneration.sh + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) +set -o errexit -o pipefail + +# Overrideable Constants (defaults also defined in sub scripts) +REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} +SCRIPT_NAME="internalDependenciesGraphs" + +## Get this "domains/internal-dependencies/graphs" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +INTERNAL_DEPENDENCIES_GRAPHS_DIR=${INTERNAL_DEPENDENCIES_GRAPHS_DIR:-$(CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)} +# echo "${SCRIPT_NAME}: INTERNAL_DEPENDENCIES_GRAPHS_DIR=${INTERNAL_DEPENDENCIES_GRAPHS_DIR}" + +# Get the "scripts" directory by navigating three levels up from this graphs directory. +SCRIPTS_DIR=${SCRIPTS_DIR:-"${INTERNAL_DEPENDENCIES_GRAPHS_DIR}/../../../scripts"} + +# Get the "scripts/visualization" directory. +VISUALIZATION_SCRIPTS_DIR=${VISUALIZATION_SCRIPTS_DIR:-"${SCRIPTS_DIR}/visualization"} + +# Cypher query directories +INTERNAL_DEPS_CYPHER_DIR="${INTERNAL_DEPENDENCIES_GRAPHS_DIR}/../queries/internal-dependencies" +PATH_FINDING_CYPHER_DIR="${INTERNAL_DEPENDENCIES_GRAPHS_DIR}/../queries/path-finding" +TOPOLOGICAL_SORT_CYPHER_DIR="${INTERNAL_DEPENDENCIES_GRAPHS_DIR}/../queries/topological-sort" + +# Define functions to execute cypher queries from within a given file +source "${SCRIPTS_DIR}/executeQueryFunctions.sh" + +# Define functions to create and delete Graph Projections like "createDirectedDependencyProjection" +source "${SCRIPTS_DIR}/projectionFunctions.sh" + +# Main report directory +REPORT_NAME="internal-dependencies" +FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" +mkdir -p "${FULL_REPORT_DIRECTORY}" + +# ── Java Artifact Visualizations ────────────────────────────────────────────── + +ARTIFACT_GRAPH_VIZ_DIR="${FULL_REPORT_DIRECTORY}/Java_Artifact/Graph_Visualizations" +mkdir -p "${ARTIFACT_GRAPH_VIZ_DIR}" + +# Build levels graph (from internal dependencies build level query) +echo "${SCRIPT_NAME}: Creating visualization JavaArtifactBuildLevels..." +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/Java_Artifact_build_levels_for_graphviz.cypher" \ + > "${ARTIFACT_GRAPH_VIZ_DIR}/JavaArtifactBuildLevels.csv" +source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${ARTIFACT_GRAPH_VIZ_DIR}/JavaArtifactBuildLevels.csv" + +ARTIFACT_PROJECTION="dependencies_projection=artifact-path-finding" +ARTIFACT_NODE="dependencies_projection_node=Artifact" +ARTIFACT_WEIGHT="dependencies_projection_weight_property=weight" + +if createDirectedDependencyProjection "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}"; then + # Ensure topological sort level exists on nodes (required for level coloring in graph). + execute_cypher_queries_until_results \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" + + echo "${SCRIPT_NAME}: Creating visualization JavaArtifactLongestPathsIsolated..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" \ + "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" \ + > "${ARTIFACT_GRAPH_VIZ_DIR}/JavaArtifactLongestPathsIsolated.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${ARTIFACT_GRAPH_VIZ_DIR}/JavaArtifactLongestPathsIsolated.csv" + + echo "${SCRIPT_NAME}: Creating visualization JavaArtifactLongestPaths..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" \ + "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" \ + > "${ARTIFACT_GRAPH_VIZ_DIR}/JavaArtifactLongestPaths.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${ARTIFACT_GRAPH_VIZ_DIR}/JavaArtifactLongestPaths.csv" +fi + +# Clean-up Java Artifact graph visualizations +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${ARTIFACT_GRAPH_VIZ_DIR}" + +# ── TypeScript Module Visualizations ────────────────────────────────────────── + +MODULE_GRAPH_VIZ_DIR="${FULL_REPORT_DIRECTORY}/Typescript_Module/Graph_Visualizations" +mkdir -p "${MODULE_GRAPH_VIZ_DIR}" + +# Build levels graph (from internal dependencies build level query) +echo "${SCRIPT_NAME}: Creating visualization TypeScriptModuleBuildLevels..." +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/Typescript_Module_build_levels_for_graphviz.cypher" \ + > "${MODULE_GRAPH_VIZ_DIR}/TypeScriptModuleBuildLevels.csv" +source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${MODULE_GRAPH_VIZ_DIR}/TypeScriptModuleBuildLevels.csv" + +MODULE_LANGUAGE="dependencies_projection_language=Typescript" +MODULE_PROJECTION="dependencies_projection=typescript-module-path-finding" +MODULE_NODE="dependencies_projection_node=Module" +MODULE_WEIGHT="dependencies_projection_weight_property=lowCouplingElement25PercentWeight" + +if createDirectedDependencyProjection "${MODULE_LANGUAGE}" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}"; then + # Ensure topological sort level exists on nodes. + execute_cypher_queries_until_results \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${MODULE_PROJECTION}" "${MODULE_NODE}" + + echo "${SCRIPT_NAME}: Creating visualization TypeScriptModuleLongestPathsIsolated..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" \ + "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" \ + > "${MODULE_GRAPH_VIZ_DIR}/TypeScriptModuleLongestPathsIsolated.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${MODULE_GRAPH_VIZ_DIR}/TypeScriptModuleLongestPathsIsolated.csv" + + echo "${SCRIPT_NAME}: Creating visualization TypeScriptModuleLongestPaths..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" \ + "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" \ + > "${MODULE_GRAPH_VIZ_DIR}/TypeScriptModuleLongestPaths.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${MODULE_GRAPH_VIZ_DIR}/TypeScriptModuleLongestPaths.csv" +fi + +# Clean-up TypeScript Module graph visualizations +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${MODULE_GRAPH_VIZ_DIR}" + +# ── NPM Non-Dev Package Visualizations ──────────────────────────────────────── + +NPM_GRAPH_VIZ_DIR="${FULL_REPORT_DIRECTORY}/NPM_NonDevPackage/Graph_Visualizations" +mkdir -p "${NPM_GRAPH_VIZ_DIR}" + +# Build levels graph (from internal dependencies build level query) +echo "${SCRIPT_NAME}: Creating visualization NpmPackageBuildLevels..." +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/NPM_Package_build_levels_for_graphviz.cypher" \ + > "${NPM_GRAPH_VIZ_DIR}/NpmPackageBuildLevels.csv" +source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${NPM_GRAPH_VIZ_DIR}/NpmPackageBuildLevels.csv" + +NPM_LANGUAGE="dependencies_projection_language=NPM" +NPM_PROJECTION="dependencies_projection=npm-non-dev-package-path-finding" +NPM_NODE="dependencies_projection_node=NpmNonDevPackage" +NPM_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" + +if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}"; then + # Ensure topological sort level exists on nodes. + execute_cypher_queries_until_results \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${NPM_PROJECTION}" "${NPM_NODE}" + + echo "${SCRIPT_NAME}: Creating visualization NpmNonDevPackageLongestPathsIsolated..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" \ + "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" \ + > "${NPM_GRAPH_VIZ_DIR}/NpmNonDevPackageLongestPathsIsolated.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${NPM_GRAPH_VIZ_DIR}/NpmNonDevPackageLongestPathsIsolated.csv" + + echo "${SCRIPT_NAME}: Creating visualization NpmNonDevPackageLongestPaths..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" \ + "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" \ + > "${NPM_GRAPH_VIZ_DIR}/NpmNonDevPackageLongestPaths.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${NPM_GRAPH_VIZ_DIR}/NpmNonDevPackageLongestPaths.csv" +fi + +# Clean-up NPM Non-Dev Package graph visualizations +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${NPM_GRAPH_VIZ_DIR}" + +# ── NPM Dev Package Visualizations ──────────────────────────────────────────── + +NPM_DEV_GRAPH_VIZ_DIR="${FULL_REPORT_DIRECTORY}/NPM_DevPackage/Graph_Visualizations" +mkdir -p "${NPM_DEV_GRAPH_VIZ_DIR}" + +NPM_DEV_PROJECTION="dependencies_projection=npm-dev-package-path-finding" +NPM_DEV_NODE="dependencies_projection_node=NpmDevPackage" +NPM_DEV_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" + +if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}"; then + # Ensure topological sort level exists on nodes. + execute_cypher_queries_until_results \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ + "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" + + echo "${SCRIPT_NAME}: Creating visualization NpmDevPackageLongestPathsIsolated..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" \ + "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" \ + > "${NPM_DEV_GRAPH_VIZ_DIR}/NpmDevPackageLongestPathsIsolated.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${NPM_DEV_GRAPH_VIZ_DIR}/NpmDevPackageLongestPathsIsolated.csv" + + echo "${SCRIPT_NAME}: Creating visualization NpmDevPackageLongestPaths..." + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" \ + "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" \ + > "${NPM_DEV_GRAPH_VIZ_DIR}/NpmDevPackageLongestPaths.csv" + source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${NPM_DEV_GRAPH_VIZ_DIR}/NpmDevPackageLongestPaths.csv" +fi + +# Clean-up NPM Dev Package graph visualizations +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${NPM_DEV_GRAPH_VIZ_DIR}" + +# Clean-up empty level directories. +# These may have been recreated by mkdir -p above even if there was no data, +# in which case cleanupAfterReportGeneration.sh deletes them since they are empty. +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/Java_Artifact" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/Typescript_Module" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/NPM_NonDevPackage" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/NPM_DevPackage" + +echo "${SCRIPT_NAME}: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." diff --git a/domains/internal-dependencies/internalDependenciesCsv.sh b/domains/internal-dependencies/internalDependenciesCsv.sh new file mode 100755 index 000000000..beed146b9 --- /dev/null +++ b/domains/internal-dependencies/internalDependenciesCsv.sh @@ -0,0 +1,299 @@ +#!/usr/bin/env bash + +# Pipeline that coordinates internal dependency analysis using Cypher queries and the +# Graph Data Science Library of Neo4j. It covers internal dependencies, cyclic dependencies, +# path finding, and topological sort across multiple abstraction levels. +# It requires an already running Neo4j graph database with already scanned and analyzed artifacts. +# The results will be written into the sub directory reports/internal-dependencies. +# Dynamically triggered by "CsvReports.sh". + +# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. + +# Requires executeQueryFunctions.sh, projectionFunctions.sh, cleanupAfterReportGeneration.sh + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) +set -o errexit -o pipefail + +# Overrideable Constants (defaults also defined in sub scripts) +REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} + +## Get this "domains/internal-dependencies" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR:-$(CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)} +echo "internalDependenciesCsv: INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR}" + +# Get the "scripts" directory by navigating two levels up from this domain directory. +SCRIPTS_DIR=${SCRIPTS_DIR:-"${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/../../scripts"} + +# Cypher query directories within this domain +INTERNAL_DEPS_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/queries/internal-dependencies" +CYCLIC_DEPS_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/queries/cyclic-dependencies" +PATH_FINDING_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/queries/path-finding" +TOPOLOGICAL_SORT_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/queries/topological-sort" + +# Define functions to execute a cypher query from within a given file like "execute_cypher" and "execute_cypher_queries_until_results" +source "${SCRIPTS_DIR}/executeQueryFunctions.sh" + +# Define functions to create and delete Graph Projections like "createDirectedDependencyProjection" +source "${SCRIPTS_DIR}/projectionFunctions.sh" + +# Create main report directory and abstraction-level subdirectories +REPORT_NAME="internal-dependencies" +FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" +mkdir -p "${FULL_REPORT_DIRECTORY}" +mkdir -p "${FULL_REPORT_DIRECTORY}/Java_Artifact" +mkdir -p "${FULL_REPORT_DIRECTORY}/Java_Package" +mkdir -p "${FULL_REPORT_DIRECTORY}/Java_Type" +mkdir -p "${FULL_REPORT_DIRECTORY}/Typescript_Module" +mkdir -p "${FULL_REPORT_DIRECTORY}/NPM_NonDevPackage" +mkdir -p "${FULL_REPORT_DIRECTORY}/NPM_DevPackage" + +# ── Internal Dependencies ───────────────────────────────────────────────────── + +echo "internalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Calculating distance between dependent files..." +execute_cypher_queries_until_results \ + "${INTERNAL_DEPS_CYPHER_DIR}/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher" \ + "${INTERNAL_DEPS_CYPHER_DIR}/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Distance_distribution_between_dependent_files.csv" + +echo "internalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Processing internal dependencies for Java..." + +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Package/Cyclic_Dependencies.csv" +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies_Breakdown.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Package/Cyclic_Dependencies_Breakdown.csv" +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_Backward_Only.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Package/Cyclic_Dependencies_Breakdown_Backward_Only.csv" +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Artifact/CyclicArtifactDependenciesUnwinded.csv" + +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/Candidates_for_Interface_Segregation.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Package/InterfaceSegregationCandidates.csv" + +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/List_all_Java_artifacts.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Artifact/List_all_Java_artifacts.csv" +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/List_types_that_are_used_by_many_different_packages.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Package/WidelyUsedTypes.csv" +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Artifact/ArtifactPackageUsage.csv" +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Java_Artifact/ClassesPerPackageUsageAcrossArtifacts.csv" + +echo "internalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Processing internal dependencies for TypeScript..." + +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies_for_Typescript.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Typescript_Module/Cyclic_Dependencies_for_Typescript.csv" +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_for_Typescript.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Typescript_Module/Cyclic_Dependencies_Breakdown_for_Typescript.csv" +execute_cypher "${CYCLIC_DEPS_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Typescript_Module/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.csv" + +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/List_all_Typescript_modules.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Typescript_Module/List_all_Typescript_modules.csv" +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Typescript_Module/WidelyUsedTypescriptElements.csv" +execute_cypher "${INTERNAL_DEPS_CYPHER_DIR}/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher" \ + > "${FULL_REPORT_DIRECTORY}/Typescript_Module/ModuleElementsUsageTypescript.csv" + +# ── Path Finding ────────────────────────────────────────────────────────────── + +echo "internalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Starting path finding..." + +# Run the path finding algorithm "All Pairs Shortest Path" for a single node label. +# +# Required Parameters: +# - dependencies_projection=... Name prefix for the in-memory projection. Example: "artifact-path-finding" +# - dependencies_projection_node=... Node label. Example: "Artifact" +# - dependencies_projection_weight_property=... Weight property name. Example: "weight" +allPairsShortestPath() { + local nodeLabel; nodeLabel=$( extractQueryParameter "dependencies_projection_node" "${@}" ) + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher" "${@}" \ + > "${CURRENT_LEVEL_DIR}/${nodeLabel}_all_pairs_shortest_paths_distribution_per_project.csv" +} + +# Run the path finding algorithm "Longest Path" (for directed acyclic graphs). +# +# Required Parameters: +# - dependencies_projection=... Name prefix for the in-memory projection. Example: "artifact-path-finding" +# - dependencies_projection_node=... Node label. Example: "Artifact" +# - dependencies_projection_weight_property=... Weight property name. Example: "weight" +longestPath() { + local nodeLabel; nodeLabel=$( extractQueryParameter "dependencies_projection_node" "${@}" ) + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_distribution_per_project.cypher" "${@}" \ + > "${CURRENT_LEVEL_DIR}/${nodeLabel}_longest_paths_distribution.csv" +} + +# Run all path finding algorithms for the current abstraction level. +runPathFindingAlgorithms() { + time allPairsShortestPath "${@}" + time longestPath "${@}" +} + +# -- Java Artifact Path Finding -------------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Java_Artifact" +ARTIFACT_PROJECTION="dependencies_projection=artifact-path-finding" +ARTIFACT_NODE="dependencies_projection_node=Artifact" +ARTIFACT_WEIGHT="dependencies_projection_weight_property=weight" + +if createDirectedDependencyProjection "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}"; then + runPathFindingAlgorithms "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" +fi + +# -- Java Package Path Finding --------------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Java_Package" +PACKAGE_PROJECTION="dependencies_projection=package-path-finding" +PACKAGE_NODE="dependencies_projection_node=Package" +PACKAGE_WEIGHT="dependencies_projection_weight_property=weight25PercentInterfaces" + +if createDirectedDependencyProjection "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}"; then + runPathFindingAlgorithms "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}" +fi + +# -- Java Type Path Finding (deactivated — too granular, too resource-intensive) -------- +#TYPE_PROJECTION="dependencies_projection=type-path-finding" +#TYPE_NODE="dependencies_projection_node=Type" +#TYPE_WEIGHT="dependencies_projection_weight_property=weight" +#if createDirectedJavaTypeDependencyProjection "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}"; then +# CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Java_Type" +# runPathFindingAlgorithms "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}" +#fi + +# -- TypeScript Module Path Finding ---------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Typescript_Module" +MODULE_LANGUAGE="dependencies_projection_language=Typescript" +MODULE_PROJECTION="dependencies_projection=typescript-module-path-finding" +MODULE_NODE="dependencies_projection_node=Module" +MODULE_WEIGHT="dependencies_projection_weight_property=lowCouplingElement25PercentWeight" + +if createDirectedDependencyProjection "${MODULE_LANGUAGE}" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}"; then + runPathFindingAlgorithms "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" +fi + +# -- Non-Dev NPM Package Path Finding -------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/NPM_NonDevPackage" +NPM_LANGUAGE="dependencies_projection_language=NPM" +NPM_PROJECTION="dependencies_projection=npm-non-dev-package-path-finding" +NPM_NODE="dependencies_projection_node=NpmNonDevPackage" +NPM_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" + +if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}"; then + runPathFindingAlgorithms "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" +fi + +# -- Dev NPM Package Path Finding ------------------------------------------ + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/NPM_DevPackage" +NPM_DEV_PROJECTION="dependencies_projection=npm-dev-package-path-finding" +NPM_DEV_NODE="dependencies_projection_node=NpmDevPackage" +NPM_DEV_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" + +if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}"; then + runPathFindingAlgorithms "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" +fi + +# ── Topological Sort ────────────────────────────────────────────────────────── + +echo "internalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Starting topological sort..." + +# Apply the algorithm "Topological Sort" and write results to the current level directory. +# +# Required Parameters: +# - dependencies_projection=... Name prefix for the in-memory projection. Example: "package-topology" +# - dependencies_projection_node=... Node label. Example: "Package" +# - dependencies_projection_weight_property=... Weight property name. Example: "weight" +topologicalSort() { + local nodeLabel; nodeLabel=$( extractQueryParameter "dependencies_projection_node" "${@}" ) + + # Write topological sort level as a node property (required for graph visualizations) + execute_cypher "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${@}" + + # Stream to CSV + execute_cypher "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Query.cypher" "${@}" \ + > "${CURRENT_LEVEL_DIR}/${nodeLabel}_Topological_Sort.csv" +} + +# -- Java Artifact Topological Sort ---------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Java_Artifact" +ARTIFACT_PROJECTION="dependencies_projection=artifact-topology" +ARTIFACT_NODE="dependencies_projection_node=Artifact" +ARTIFACT_WEIGHT="dependencies_projection_weight_property=weight" + +if createDirectedDependencyProjection "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}"; then + time topologicalSort "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" +fi + +# -- Java Package Topological Sort ----------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Java_Package" +PACKAGE_PROJECTION="dependencies_projection=package-topology" +PACKAGE_NODE="dependencies_projection_node=Package" +PACKAGE_WEIGHT="dependencies_projection_weight_property=weight25PercentInterfaces" + +if createDirectedDependencyProjection "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}"; then + time topologicalSort "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}" +fi + +# -- Java Type Topological Sort -------------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Java_Type" +TYPE_PROJECTION="dependencies_projection=type-topology" +TYPE_NODE="dependencies_projection_node=Type" +TYPE_WEIGHT="dependencies_projection_weight_property=weight" + +if createDirectedJavaTypeDependencyProjection "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}"; then + time topologicalSort "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}" +fi + +# -- TypeScript Module Topological Sort ------------------------------------ + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/Typescript_Module" +MODULE_LANGUAGE="dependencies_projection_language=Typescript" +MODULE_PROJECTION="dependencies_projection=typescript-module-topology" +MODULE_NODE="dependencies_projection_node=Module" +MODULE_WEIGHT="dependencies_projection_weight_property=lowCouplingElement25PercentWeight" + +if createDirectedDependencyProjection "${MODULE_LANGUAGE}" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}"; then + time topologicalSort "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" +fi + +# -- Non-Dev NPM Package Topological Sort ---------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/NPM_NonDevPackage" +NPM_LANGUAGE="dependencies_projection_language=NPM" +NPM_PROJECTION="dependencies_projection=npm-non-dev-package-topology" +NPM_NODE="dependencies_projection_node=NpmNonDevPackage" +NPM_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" + +if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}"; then + time topologicalSort "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" +fi + +# -- Dev NPM Package Topological Sort -------------------------------------- + +CURRENT_LEVEL_DIR="${FULL_REPORT_DIRECTORY}/NPM_DevPackage" +NPM_DEV_PROJECTION="dependencies_projection=npm-dev-package-topology" +NPM_DEV_NODE="dependencies_projection_node=NpmDevPackage" +NPM_DEV_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" + +if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}"; then + time topologicalSort "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" +fi + +# ── Final clean-up ──────────────────────────────────────────────────────────── + +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/Java_Artifact" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/Java_Package" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/Java_Type" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/Typescript_Module" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/NPM_NonDevPackage" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}/NPM_DevPackage" +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" + +echo "internalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." diff --git a/domains/internal-dependencies/internalDependenciesMarkdown.sh b/domains/internal-dependencies/internalDependenciesMarkdown.sh new file mode 100755 index 000000000..620e05241 --- /dev/null +++ b/domains/internal-dependencies/internalDependenciesMarkdown.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash + +# This script is dynamically triggered by "MarkdownReports.sh" when report "All" or "Markdown" are enabled. +# It is designed as an entry point and delegates the execution to the dedicated "internalDependenciesSummary.sh" script that does the "heavy lifting". + +# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. + +# Requires internalDependenciesSummary.sh + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) +set -o errexit -o pipefail + +# Overrideable Constants (defaults also defined in sub scripts) +REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} + +## Get this "domains/internal-dependencies" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR:-$(CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)} +# echo "internalDependenciesMarkdown: INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR}" + +# Get the "summary" directory by taking the path of this script and selecting "summary". +INTERNAL_DEPENDENCIES_SUMMARY_DIR=${INTERNAL_DEPENDENCIES_SUMMARY_DIR:-"${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/summary"} # Contains everything (scripts, templates) to create the Markdown summary report + +# Delegate the execution to the responsible script. +source "${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/internalDependenciesSummary.sh" diff --git a/domains/internal-dependencies/internalDependenciesPython.sh b/domains/internal-dependencies/internalDependenciesPython.sh new file mode 100755 index 000000000..78731fe05 --- /dev/null +++ b/domains/internal-dependencies/internalDependenciesPython.sh @@ -0,0 +1,69 @@ +#!/usr/bin/env bash + +# Generates path finding charts as SVG files using Python. +# It requires that "internalDependenciesCsv.sh" has already run to produce the CSV data files. +# The results will be written into the sub directory reports/internal-dependencies. +# Dynamically triggered by "PythonReports.sh". + +# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. +# Note that "internalDependenciesCsv.sh" is required to run prior to this script +# so that path finding CSV files exist in the report directory. + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) +set -o errexit -o pipefail + +# Overrideable Constants (defaults also defined in sub scripts) +REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} + +## Get this "domains/internal-dependencies" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR:-$(CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)} +echo "internalDependenciesPython: INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR}" + +# Get the "scripts" directory by navigating two levels up from this domain directory. +SCRIPTS_DIR=${SCRIPTS_DIR:-"${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/../../scripts"} + +# Function to display script usage +usage() { + echo -e "${COLOR_ERROR}" >&2 + echo "Usage: $0 [--verbose]" >&2 + echo -e "${COLOR_DEFAULT}" >&2 + exit 1 +} + +# Default values +verboseMode="" # either "" or "--verbose" + +# Parse command line arguments +while [[ $# -gt 0 ]]; do + key="$1" + + case ${key} in + --verbose) + verboseMode="--verbose" + ;; + *) + echo -e "${COLOR_ERROR}internalDependenciesPython: Error: Unknown option: ${key}${COLOR_DEFAULT}" >&2 + usage + ;; + esac + shift || true # ignore error when there are no more arguments +done + +# Create report directory +REPORT_NAME="internal-dependencies" +FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" +mkdir -p "${FULL_REPORT_DIRECTORY}" + +echo "internalDependenciesPython: $(date +'%Y-%m-%dT%H:%M:%S%z') Starting path finding chart generation..." + +time python "${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/pathFindingCharts.py" \ + --report_directory "${FULL_REPORT_DIRECTORY}" \ + ${verboseMode} + +# Clean-up after report generation. Empty reports will be deleted. +source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" + +echo "internalDependenciesPython: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." diff --git a/domains/internal-dependencies/internalDependenciesVisualization.sh b/domains/internal-dependencies/internalDependenciesVisualization.sh new file mode 100755 index 000000000..9d4ba397b --- /dev/null +++ b/domains/internal-dependencies/internalDependenciesVisualization.sh @@ -0,0 +1,27 @@ +#!/usr/bin/env bash + +# This script is dynamically triggered by "VisualizationReports.sh" when report "All" or "Visualization" is enabled. +# It is designed as an entry point and delegates the execution to the dedicated "internalDependenciesGraphs.sh" script that does the "heavy lifting". + +# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. + +# Requires internalDependenciesGraphs.sh + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) +set -o errexit -o pipefail + +# Overrideable Constants (defaults also defined in sub scripts) +REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} + +## Get this "domains/internal-dependencies" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR:-$(CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)} +# echo "internalDependenciesVisualization: INTERNAL_DEPENDENCIES_SCRIPT_DIR=${INTERNAL_DEPENDENCIES_SCRIPT_DIR}" + +# Get the "graphs" directory by taking the path of this script and selecting "graphs". +INTERNAL_DEPENDENCIES_GRAPHS_DIR=${INTERNAL_DEPENDENCIES_GRAPHS_DIR:-"${INTERNAL_DEPENDENCIES_SCRIPT_DIR}/graphs"} # Contains everything (scripts, queries) to create graph visualizations + +# Delegate the execution to the responsible script. +source "${INTERNAL_DEPENDENCIES_GRAPHS_DIR}/internalDependenciesGraphs.sh" diff --git a/domains/internal-dependencies/pathFindingCharts.py b/domains/internal-dependencies/pathFindingCharts.py new file mode 100644 index 000000000..7e130bbe3 --- /dev/null +++ b/domains/internal-dependencies/pathFindingCharts.py @@ -0,0 +1,467 @@ +#!/usr/bin/env python + +# Generates path finding charts as SVG files from CSV data produced by internalDependenciesCsv.sh. +# Charts are saved to the report directory subdirectories and referenced by the Markdown summary report. +# +# Input Parameters: +# --report_directory path to the report directory (contains Java_Artifact/, Java_Package/, etc.) +# --verbose optional finer-grained logging +# +# Prerequisites: +# - internalDependenciesCsv.sh must have run first to produce the required CSV files. + +import os +import sys +import argparse +import typing + +import pandas as pd +import numpy as np + +import matplotlib +matplotlib.use('Agg') # Non-interactive backend — required for headless script execution +import matplotlib.pyplot as plot + +SCRIPT_NAME = "pathFindingCharts" + +# Column names from the path finding Cypher query results +DISTANCE_COLUMN = "distance" +PAIR_COUNT_COLUMN = "pairCount" +TOTAL_PAIR_COUNT_COLUMN = "distanceTotalPairCount" +SOURCE_PROJECT_COLUMN = "sourceProject" +IS_DIFFERENT_TARGET_PROJECT_COLUMN = "isDifferentTargetProject" + +# Abstraction level configurations: (subdirectory, nodeLabel, description) +ABSTRACTION_LEVELS = [ + ("Java_Package", "Package", "Java Package"), + ("Java_Artifact", "Artifact", "Java Artifact"), + ("Typescript_Module","Module", "TypeScript Module"), + ("NPM_NonDevPackage","NpmNonDevPackage", "NPM Non-Dev Package"), + ("NPM_DevPackage", "NpmDevPackage", "NPM Dev Package"), +] + +# Colormap matching the original PathFindingJava.ipynb notebook +MAIN_COLOR_MAP = "nipy_spectral" + + +class Parameters: + def __init__( + self, + report_directory: str, + verbose: bool, + ) -> None: + self.report_directory = report_directory + self.verbose = verbose + + def __repr__(self) -> str: + return ( + f"Parameters(" + f"report_directory={self.report_directory!r}, " + f"verbose={self.verbose})" + ) + + @staticmethod + def log_dependency_versions() -> None: + print("---------------------------------------") + print(f"Python version: {sys.version}") + from pandas import __version__ as pandas_version + print(f"pandas version: {pandas_version}") + from matplotlib import __version__ as matplotlib_version + print(f"matplotlib version: {matplotlib_version}") + print("---------------------------------------") + + +def parse_parameters() -> Parameters: + parser = argparse.ArgumentParser( + description="Generates path finding charts as SVG files from CSV data." + ) + parser.add_argument( + "--report_directory", + type=str, + default="", + help="Path to the report directory containing abstraction-level subdirectories with CSV files", + ) + parser.add_argument( + "--verbose", + action="store_true", + default=False, + help="Enable verbose mode for detailed logging", + ) + args = parser.parse_args() + return Parameters( + report_directory=args.report_directory, + verbose=args.verbose, + ) + + +def chart_file_path(name: str, level_directory: str, verbose: bool) -> str: + """Returns the full SVG file path for a named chart within the level directory.""" + path = os.path.join(level_directory, name.replace(" ", "_") + ".svg") + if verbose: + print(f"{SCRIPT_NAME}: Saving {path}") + return path + + +def load_csv(csv_path: str, verbose: bool) -> typing.Optional[pd.DataFrame]: + """Loads a CSV file into a DataFrame. Returns None if the file does not exist or is empty.""" + if not os.path.isfile(csv_path): + if verbose: + print(f"{SCRIPT_NAME}: Skipping {csv_path} — file not found") + return None + data_frame = pd.read_csv(csv_path, sep=",") + if data_frame.empty: + if verbose: + print(f"{SCRIPT_NAME}: Skipping {csv_path} — file is empty") + return None + if verbose: + print(f"{SCRIPT_NAME}: Loaded {len(data_frame)} rows from {csv_path}") + return data_frame + + +# ── Pure data transformation functions ──────────────────────────────────────── + + +def aggregate_total_distribution(data_frame: pd.DataFrame) -> pd.DataFrame: + """Aggregates path count per distance across all projects.""" + return ( + data_frame + .groupby(DISTANCE_COLUMN)[PAIR_COUNT_COLUMN] + .sum() + .reset_index() + .sort_values(DISTANCE_COLUMN) + ) + + +def pivot_distribution_by_project( + data_frame: pd.DataFrame, + distance_column: str, + count_column: str, + project_column: str, +) -> pd.DataFrame: + """ + Pivots the per-project distribution into a wide format: + rows = distance, columns = project names, values = path counts. + Only includes intra-project pairs (isDifferentTargetProject == False). + """ + intra = data_frame[data_frame[IS_DIFFERENT_TARGET_PROJECT_COLUMN] == False].copy() # noqa: E712 + if intra.empty: + return pd.DataFrame() + pivoted = intra.pivot_table( + index=distance_column, + columns=project_column, + values=count_column, + aggfunc="sum", + fill_value=0, + ) + return pivoted.sort_index() + + +def normalize_distribution_by_project(pivoted_data: pd.DataFrame) -> pd.DataFrame: + """Normalizes a pivoted distribution so each project column sums to 1.0 (100%).""" + column_sums = pivoted_data.sum(axis=0) + normalized = pivoted_data.div(column_sums, axis=1) + return normalized + + +def max_distance_per_project(data_frame: pd.DataFrame) -> pd.DataFrame: + """Returns the maximum distance (diameter) per source project, sorted descending.""" + return ( + data_frame + .groupby(SOURCE_PROJECT_COLUMN)[DISTANCE_COLUMN] + .max() + .reset_index() + .rename(columns={DISTANCE_COLUMN: "diameter"}) + .sort_values("diameter", ascending=False) + ) + + +# ── Chart generation functions ──────────────────────────────────────────────── + + +def plot_distribution_bar( + data_frame: pd.DataFrame, + distance_column: str, + count_column: str, + title: str, + file_path: str, +) -> None: + """Saves a bar chart showing path count per distance to file_path as SVG.""" + figure, axes = plot.subplots(figsize=(10, 5)) + data_frame.plot( + kind="bar", + x=distance_column, + y=count_column, + ax=axes, + legend=False, + grid=True, + fontsize=8, + cmap=MAIN_COLOR_MAP, + xlabel="Distance (number of hops)", + ylabel="Number of paths", + title=title, + ) + axes.tick_params(axis="x", labelrotation=45) + figure.tight_layout() + figure.savefig(file_path, format="svg") + plot.close(figure) + + +def plot_distribution_pie( + data_frame: pd.DataFrame, + distance_column: str, + count_column: str, + title: str, + file_path: str, +) -> None: + """Saves a pie chart showing percentage of paths per distance to file_path as SVG.""" + total = data_frame[count_column].sum() + explode = np.full(len(data_frame), 0.01) + + def autopct(pct: float) -> str: + return '{:1.2f}% ({:.0f})'.format(pct, total * pct / 100) + + figure, axes = plot.subplots(figsize=(8, 8)) + data_frame.plot( + kind="pie", + y=count_column, + labels=data_frame[distance_column], + labeldistance=None, + legend=True, + autopct=autopct, + explode=explode, + textprops={"fontsize": 8}, + pctdistance=1.2, + cmap=MAIN_COLOR_MAP, + ax=axes, + title=title, + ) + axes.set_ylabel("") + axes.legend(bbox_to_anchor=(1.05, 1), loc="upper left", title="distance") + figure.tight_layout() + figure.savefig(file_path, format="svg") + plot.close(figure) + + +def plot_per_project_stacked_bar( + pivoted_data: pd.DataFrame, + title: str, + file_path: str, + use_log_scale: bool = False, +) -> None: + """Saves a stacked bar chart per project (distances stacked) as SVG.""" + if pivoted_data.empty: + return + figure, axes = plot.subplots(figsize=(max(10, len(pivoted_data.columns) * 0.8 + 4), 6)) + pivoted_data.T.plot(kind="bar", stacked=True, ax=axes, colormap=MAIN_COLOR_MAP) + axes.set_xlabel("Project") + axes.set_ylabel("Number of paths" + (" (log scale)" if use_log_scale else "")) + axes.set_title(title) + axes.legend(title="Distance", bbox_to_anchor=(1.05, 1), loc="upper left", fontsize="small") + if use_log_scale: + axes.set_yscale("log") + axes.set_ylim(bottom=0.1) + axes.tick_params(axis="x", labelrotation=45) + figure.tight_layout() + figure.savefig(file_path, format="svg") + plot.close(figure) + + +def plot_per_project_normalized_bar( + normalized_data: pd.DataFrame, + title: str, + file_path: str, +) -> None: + """Saves a normalized stacked bar chart (each project sums to 100%) as SVG.""" + if normalized_data.empty: + return + figure, axes = plot.subplots(figsize=(max(10, len(normalized_data.columns) * 0.8 + 4), 6)) + (normalized_data * 100).T.plot(kind="bar", stacked=True, ax=axes, colormap=MAIN_COLOR_MAP) + axes.set_xlabel("Project") + axes.set_ylabel("Percentage of paths (%)") + axes.set_title(title) + axes.legend(title="Distance", bbox_to_anchor=(1.05, 1), loc="upper left", fontsize="small") + axes.tick_params(axis="x", labelrotation=45) + figure.tight_layout() + figure.savefig(file_path, format="svg") + plot.close(figure) + + +def plot_diameter_bar( + data_frame: pd.DataFrame, + project_column: str, + diameter_column: str, + title: str, + file_path: str, +) -> None: + """Saves a bar chart of graph diameter (max distance) per project, sorted descending.""" + if data_frame.empty: + return + display = data_frame.head(20) + figure, axes = plot.subplots(figsize=(max(10, len(display) * 0.8 + 2), 5)) + display.plot( + kind="bar", + x=project_column, + y=diameter_column, + ax=axes, + legend=False, + grid=True, + fontsize=8, + cmap=MAIN_COLOR_MAP, + xlabel="Project", + ylabel="Graph diameter (max shortest path)", + title=title, + ) + axes.tick_params(axis="x", labelrotation=45) + figure.tight_layout() + figure.savefig(file_path, format="svg") + plot.close(figure) + + +# ── Per-abstraction-level chart generation ──────────────────────────────────── + + +def generate_charts_for_level( + subdirectory: str, + node_label: str, + description: str, + report_directory: str, + verbose: bool, +) -> None: + """Generates all path finding charts for a single abstraction level.""" + level_directory = os.path.join(report_directory, subdirectory) + if not os.path.isdir(level_directory): + if verbose: + print(f"{SCRIPT_NAME}: Skipping {description} — directory not found: {level_directory}") + return + + all_pairs_shortest_paths_csv = os.path.join( + level_directory, + f"{node_label}_all_pairs_shortest_paths_distribution_per_project.csv", + ) + longest_csv = os.path.join( + level_directory, + f"{node_label}_longest_paths_distribution.csv", + ) + + # ── All pairs shortest path charts ──────────────────────────────────────── + all_pairs_shortest_paths_data = load_csv(all_pairs_shortest_paths_csv, verbose) + if all_pairs_shortest_paths_data is not None and not all_pairs_shortest_paths_data.empty: + print(f"{SCRIPT_NAME}: Generating All Pairs Shortest Path charts for {description}...") + + total_all_pairs_shortest_paths = aggregate_total_distribution(all_pairs_shortest_paths_data) + + if not total_all_pairs_shortest_paths.empty: + plot_distribution_bar( + total_all_pairs_shortest_paths, DISTANCE_COLUMN, PAIR_COUNT_COLUMN, + f"{description} — All Pairs Shortest Path Distribution", + chart_file_path(f"{subdirectory}_AllPairsShortestPath_Bar", level_directory, verbose), + ) + plot_distribution_pie( + total_all_pairs_shortest_paths, DISTANCE_COLUMN, PAIR_COUNT_COLUMN, + f"{description} — All Pairs Shortest Path by Distance", + chart_file_path(f"{subdirectory}_AllPairsShortestPath_Pie", level_directory, verbose), + ) + + pivoted_all_pairs_shortest_paths = pivot_distribution_by_project(all_pairs_shortest_paths_data, DISTANCE_COLUMN, PAIR_COUNT_COLUMN, SOURCE_PROJECT_COLUMN) + if not pivoted_all_pairs_shortest_paths.empty: + plot_per_project_stacked_bar( + pivoted_all_pairs_shortest_paths, + f"{description} — All Pairs Shortest Path per Project (absolute, log scale)", + chart_file_path(f"{subdirectory}_AllPairsShortestPath_StackedBar_Log", level_directory, verbose), + use_log_scale=True, + ) + normalized_all_pairs_shortest_paths = normalize_distribution_by_project(pivoted_all_pairs_shortest_paths) + plot_per_project_normalized_bar( + normalized_all_pairs_shortest_paths, + f"{description} — All Pairs Shortest Path per Project (normalised)", + chart_file_path(f"{subdirectory}_AllPairsShortestPath_StackedBar_Normalised", level_directory, verbose), + ) + + diameter_data = max_distance_per_project(all_pairs_shortest_paths_data) + if not diameter_data.empty: + plot_diameter_bar( + diameter_data, SOURCE_PROJECT_COLUMN, "diameter", + f"{description} — Graph Diameter per Project", + chart_file_path(f"{subdirectory}_GraphDiameter_per_Project", level_directory, verbose), + ) + + # ── Longest path charts ──────────────────────────────────────────────────── + longest_data = load_csv(longest_csv, verbose) + if longest_data is not None and not longest_data.empty: + print(f"{SCRIPT_NAME}: Generating Longest Path charts for {description}...") + + total_longest = aggregate_total_distribution(longest_data) + + if not total_longest.empty: + plot_distribution_bar( + total_longest, DISTANCE_COLUMN, PAIR_COUNT_COLUMN, + f"{description} — Longest Path Distribution", + chart_file_path(f"{subdirectory}_LongestPath_Bar", level_directory, verbose), + ) + plot_distribution_pie( + total_longest, DISTANCE_COLUMN, PAIR_COUNT_COLUMN, + f"{description} — Longest Path by Distance", + chart_file_path(f"{subdirectory}_LongestPath_Pie", level_directory, verbose), + ) + + pivoted_longest = pivot_distribution_by_project(longest_data, DISTANCE_COLUMN, PAIR_COUNT_COLUMN, SOURCE_PROJECT_COLUMN) + if not pivoted_longest.empty: + plot_per_project_stacked_bar( + pivoted_longest, + f"{description} — Longest Path per Project (absolute, log scale)", + chart_file_path(f"{subdirectory}_LongestPath_StackedBar_Log", level_directory, verbose), + use_log_scale=True, + ) + normalized_longest = normalize_distribution_by_project(pivoted_longest) + plot_per_project_normalized_bar( + normalized_longest, + f"{description} — Longest Path per Project (normalised)", + chart_file_path(f"{subdirectory}_LongestPath_StackedBar_Normalised", level_directory, verbose), + ) + + max_longest = max_distance_per_project(longest_data) + if not max_longest.empty: + plot_diameter_bar( + max_longest, SOURCE_PROJECT_COLUMN, "diameter", + f"{description} — Max Longest Path per Project", + chart_file_path(f"{subdirectory}_MaxLongestPath_per_Project", level_directory, verbose), + ) + + +# ── Main ────────────────────────────────────────────────────────────────────── + + +def main() -> None: + parameters = parse_parameters() + + if parameters.verbose: + print(parameters) + Parameters.log_dependency_versions() + + if not parameters.report_directory: + print(f"{SCRIPT_NAME}: Error: --report_directory is required.", file=sys.stderr) + sys.exit(1) + + if not os.path.isdir(parameters.report_directory): + print( + f"{SCRIPT_NAME}: Error: report directory does not exist: {parameters.report_directory}", + file=sys.stderr, + ) + sys.exit(1) + + print(f"{SCRIPT_NAME}: Generating path finding charts in {parameters.report_directory}...") + + for subdirectory, node_label, description in ABSTRACTION_LEVELS: + generate_charts_for_level( + subdirectory=subdirectory, + node_label=node_label, + description=description, + report_directory=parameters.report_directory, + verbose=parameters.verbose, + ) + + print(f"{SCRIPT_NAME}: Successfully finished.") + + +if __name__ == "__main__": + main() diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies.cypher diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown.cypher diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_Backward_Only.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_Backward_Only.cypher diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_Breakdown_for_Typescript.cypher diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher diff --git a/cypher/Cyclic_Dependencies/Cyclic_Dependencies_for_Typescript.cypher b/domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_for_Typescript.cypher similarity index 100% rename from cypher/Cyclic_Dependencies/Cyclic_Dependencies_for_Typescript.cypher rename to domains/internal-dependencies/queries/cyclic-dependencies/Cyclic_Dependencies_for_Typescript.cypher diff --git a/domains/internal-dependencies/queries/exploration/Annotated_code_elements.cypher b/domains/internal-dependencies/queries/exploration/Annotated_code_elements.cypher new file mode 100644 index 000000000..f034ce842 --- /dev/null +++ b/domains/internal-dependencies/queries/exploration/Annotated_code_elements.cypher @@ -0,0 +1,33 @@ +// Annotated code elements overall by element type with some examples + +MATCH (annotatedElement:Annotation|Parameter|Field|Method|Type&!ExternalType&!JavaType)-[:ANNOTATED_BY]->()-[]->(annotation:Type) +OPTIONAL MATCH (parameterOwnerType:Type)-[:DECLARES]->(parameterOwnerMethod:Method)-[:HAS]->(annotatedElement:Parameter) +OPTIONAL MATCH (memberDeclaringType:Type)-[:DECLARES]->(annotatedElement:Member) + WITH annotation + ,annotatedElement + ,coalesce(annotatedElement.fqn + ,annotatedElement.fileName + ,memberDeclaringType.fqn + '.' + annotatedElement.name + ,parameterOwnerType.fqn + '.' + parameterOwnerMethod.name + '(' + annotatedElement.index + ')' + ,annotatedElement.name + ) AS nameOfAnnotatedElement + WITH annotation.fqn AS annotationName + ,CASE WHEN 'Annotation' IN labels(annotatedElement) THEN 'Annotation' + WHEN 'Parameter' IN labels(annotatedElement) THEN 'Parameter' + WHEN 'Field' IN labels(annotatedElement) THEN 'Field' + WHEN 'Constructor' IN labels(annotatedElement) THEN 'Constructor' + WHEN 'Method' IN labels(annotatedElement) THEN 'Method' + WHEN 'Member' IN labels(annotatedElement) THEN 'Member' + WHEN 'Class' IN labels(annotatedElement) THEN 'Class' + WHEN 'Interface' IN labels(annotatedElement) THEN 'Interface' + WHEN 'Enum' IN labels(annotatedElement) THEN 'Enum' + WHEN 'Type' IN labels(annotatedElement) THEN 'Type' + ELSE 'Unexpected' + END AS languageElement + ,count(DISTINCT annotatedElement) AS numberOfAnnotatedElements + ,collect(DISTINCT nameOfAnnotatedElement) AS annotatedElements +RETURN annotationName + ,languageElement + ,numberOfAnnotatedElements + ,annotatedElements[0..9] AS examples +ORDER BY numberOfAnnotatedElements DESCENDING \ No newline at end of file diff --git a/domains/internal-dependencies/queries/exploration/Artifacts_with_duplicate_packages.cypher b/domains/internal-dependencies/queries/exploration/Artifacts_with_duplicate_packages.cypher new file mode 100644 index 000000000..3235a2eb1 --- /dev/null +++ b/domains/internal-dependencies/queries/exploration/Artifacts_with_duplicate_packages.cypher @@ -0,0 +1,12 @@ +// Artifacts with the same full qualified package name (duplicate packages). These can lead to confusion and provide access to package protected classes to another artifact that might not be intended. Requires "Add_file_name and_extension.cypher". + + MATCH (artifact:Artifact)-[:CONTAINS]->(package:Package) + WHERE EXISTS ((package)-[:CONTAINS]->(:Type)) + WITH package.fqn AS packageName + ,artifact.name as artifactName + WITH packageName + ,collect(DISTINCT artifactName) AS artifactNames + ,count(*) AS duplicates + WHERE duplicates > 1 +RETURN packageName, duplicates, artifactNames + ORDER BY duplicates DESCENDING \ No newline at end of file diff --git a/domains/internal-dependencies/queries/internal-dependencies/Candidates_for_Interface_Segregation.cypher b/domains/internal-dependencies/queries/internal-dependencies/Candidates_for_Interface_Segregation.cypher new file mode 100644 index 000000000..5e15f925a --- /dev/null +++ b/domains/internal-dependencies/queries/internal-dependencies/Candidates_for_Interface_Segregation.cypher @@ -0,0 +1,41 @@ +// Candidates for Interface Segregation +// Lists Java interfaces that declare many methods but where callers only use a small subset. +// These are candidates to be split into a smaller, more focused interface (ISP). +// Column descriptions: +// - fullQualifiedTypeName: FQN of the interface that may be too broad +// - declaredMethodCount: total public methods (declared + inherited from super-interfaces) +// - distinctCalledMethodCount: how many distinct methods callers actually invoke +// - usageRatio: distinctCalledMethodCount / declaredMethodCount (lower = stronger candidate) +// - callerCount: number of distinct caller types using only that subset +// - exampleCalledMethods: the actual method names callers are using + +MATCH (type:Type)-[:DECLARES]->(method:Method)-[:INVOKES]->(dependentMethod:Method) +MATCH (dependentMethod)<-[:DECLARES]-(dependentType:Type&Interface) +MATCH (dependentType)-[:IMPLEMENTS*1..9]->(superType:Type)-[:DECLARES]->(inheritedMethod:Method) +WHERE type.fqn <> dependentType.fqn + AND dependentMethod.name IS NOT NULL + AND inheritedMethod.name IS NOT NULL + AND dependentMethod.name <> '' // ignore constructors + AND inheritedMethod.name <> '' // ignore constructors + WITH type.fqn AS fullTypeName + ,dependentType.fqn AS fullQualifiedTypeName + ,collect(DISTINCT dependentMethod.name) AS exampleCalledMethods + ,count(DISTINCT dependentMethod) AS distinctCalledMethodCount + // Count the different signatures without the return type + // of all declared methods including the inherited ones + ,count(DISTINCT split(method.signature, ' ')[1]) + count(DISTINCT split(inheritedMethod.signature, ' ')[1]) AS declaredMethodCount +// Filter out types that declare only a few more methods than those that are actually used. +// A good interface segregation candidate declares a lot of methods where only a few of them are used widely. +WHERE declaredMethodCount > distinctCalledMethodCount + 2 + WITH fullQualifiedTypeName + ,declaredMethodCount + ,exampleCalledMethods + ,distinctCalledMethodCount + ,count(DISTINCT fullTypeName) AS callerCount + RETURN fullQualifiedTypeName + ,declaredMethodCount + ,distinctCalledMethodCount + ,round(toFloat(distinctCalledMethodCount) / declaredMethodCount * 100) / 100 AS usageRatio + ,callerCount + ,exampleCalledMethods + ORDER BY callerCount DESC, declaredMethodCount DESC, fullQualifiedTypeName \ No newline at end of file diff --git a/domains/internal-dependencies/queries/internal-dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher b/domains/internal-dependencies/queries/internal-dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher new file mode 100644 index 000000000..67196271e --- /dev/null +++ b/domains/internal-dependencies/queries/internal-dependencies/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher @@ -0,0 +1,17 @@ +// Get file distance distribution for dependencies (intuitively the fewest number of change directory commands needed) + + MATCH (source:File)-[dependency:DEPENDS_ON]->(target:File) + WHERE dependency.fileDistanceAsFewestChangeDirectoryCommands IS NOT NULL + WITH count(*) AS totalNumberOfDependencies + ,collect(dependency) AS dependencies + UNWIND dependencies AS dependency + WITH * + ,startNode(dependency) AS source + ,endNode(dependency) AS target + RETURN dependency.fileDistanceAsFewestChangeDirectoryCommands AS directoryDistance + ,count(*) AS numberOfDependencies + ,round(count(*) / (max(totalNumberOfDependencies) + 1E-38) * 100, 2) AS percentageOfDependencies + ,count(DISTINCT source) AS numberOfDependencyUsers + ,count(DISTINCT target) AS numberOfDependencyProviders + ,collect(source.fileName + ' uses ' + target.fileName)[0..4] AS examples + ORDER BY directoryDistance \ No newline at end of file diff --git a/cypher/Internal_Dependencies/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher b/domains/internal-dependencies/queries/internal-dependencies/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher similarity index 100% rename from cypher/Internal_Dependencies/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher rename to domains/internal-dependencies/queries/internal-dependencies/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher diff --git a/cypher/Internal_Dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher b/domains/internal-dependencies/queries/internal-dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher similarity index 100% rename from cypher/Internal_Dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher rename to domains/internal-dependencies/queries/internal-dependencies/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher diff --git a/cypher/Internal_Dependencies/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher b/domains/internal-dependencies/queries/internal-dependencies/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher similarity index 100% rename from cypher/Internal_Dependencies/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher rename to domains/internal-dependencies/queries/internal-dependencies/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher diff --git a/cypher/Internal_Dependencies/Inter_scan_and_project_dependencies_of_Typescript_modules.cypher b/domains/internal-dependencies/queries/internal-dependencies/Inter_scan_and_project_dependencies_of_Typescript_modules.cypher similarity index 100% rename from cypher/Internal_Dependencies/Inter_scan_and_project_dependencies_of_Typescript_modules.cypher rename to domains/internal-dependencies/queries/internal-dependencies/Inter_scan_and_project_dependencies_of_Typescript_modules.cypher diff --git a/cypher/Internal_Dependencies/Java_Artifact_build_levels_for_graphviz.cypher b/domains/internal-dependencies/queries/internal-dependencies/Java_Artifact_build_levels_for_graphviz.cypher similarity index 94% rename from cypher/Internal_Dependencies/Java_Artifact_build_levels_for_graphviz.cypher rename to domains/internal-dependencies/queries/internal-dependencies/Java_Artifact_build_levels_for_graphviz.cypher index 83c802b3b..b142b907e 100644 --- a/cypher/Internal_Dependencies/Java_Artifact_build_levels_for_graphviz.cypher +++ b/domains/internal-dependencies/queries/internal-dependencies/Java_Artifact_build_levels_for_graphviz.cypher @@ -6,6 +6,7 @@ WITH min(dependencyForStatistics.weight) AS minWeight ,max(dependencyForStatistics.weight) AS maxWeight ,max(targetForStatistics.maxDistanceFromSource) AS maxLevel + WITH *, CASE WHEN minWeight = maxWeight THEN maxWeight + 1 ELSE maxWeight END AS maxWeight MATCH (source:Java:Artifact)-[dependency:DEPENDS_ON]->(target:Java:Artifact) WHERE source.maxDistanceFromSource IS NOT NULL AND target.maxDistanceFromSource IS NOT NULL diff --git a/cypher/Internal_Dependencies/List_all_Java_artifacts.cypher b/domains/internal-dependencies/queries/internal-dependencies/List_all_Java_artifacts.cypher similarity index 80% rename from cypher/Internal_Dependencies/List_all_Java_artifacts.cypher rename to domains/internal-dependencies/queries/internal-dependencies/List_all_Java_artifacts.cypher index d8d8c11ce..c735dfdd0 100644 --- a/cypher/Internal_Dependencies/List_all_Java_artifacts.cypher +++ b/domains/internal-dependencies/queries/internal-dependencies/List_all_Java_artifacts.cypher @@ -6,4 +6,5 @@ MATCH (artifact:Java:Artifact)-[:CONTAINS]->(package:Java:Package)-[:CONTAINS]-> ,artifact.outgoingDependencies AS outgoingDependencies ,COUNT(DISTINCT package.fqn) AS packages ,COUNT(DISTINCT type.fqn) AS types -RETURN artifactName, packages, types, incomingDependencies, outgoingDependencies \ No newline at end of file +RETURN artifactName, packages, types, incomingDependencies, outgoingDependencies +ORDER BY packages DESC, types DESC, incomingDependencies DESC, outgoingDependencies DESC, artifactName \ No newline at end of file diff --git a/cypher/Internal_Dependencies/List_all_Typescript_modules.cypher b/domains/internal-dependencies/queries/internal-dependencies/List_all_Typescript_modules.cypher similarity index 100% rename from cypher/Internal_Dependencies/List_all_Typescript_modules.cypher rename to domains/internal-dependencies/queries/internal-dependencies/List_all_Typescript_modules.cypher diff --git a/cypher/Internal_Dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher b/domains/internal-dependencies/queries/internal-dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher similarity index 100% rename from cypher/Internal_Dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher rename to domains/internal-dependencies/queries/internal-dependencies/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher diff --git a/domains/internal-dependencies/queries/internal-dependencies/List_types_that_are_used_by_many_different_packages.cypher b/domains/internal-dependencies/queries/internal-dependencies/List_types_that_are_used_by_many_different_packages.cypher new file mode 100644 index 000000000..8630a8a30 --- /dev/null +++ b/domains/internal-dependencies/queries/internal-dependencies/List_types_that_are_used_by_many_different_packages.cypher @@ -0,0 +1,17 @@ +// List types that are used by many different packages + + MATCH (artifact:Artifact)-[:CONTAINS]->(package:Package)-[:CONTAINS]->(type:Type)-[:DEPENDS_ON]->(dependentType:Type)<-[:CONTAINS]-(dependentPackage:Package)<-[:CONTAINS]-(dependentArtifact:Artifact) + WHERE package <> dependentPackage + WITH dependentType + ,labels(dependentType) AS dependentTypeLabels + ,COUNT(DISTINCT package.fqn) AS numberOfUsingPackages +UNWIND dependentTypeLabels AS dependentTypeLabel + WITH * + WHERE NOT dependentTypeLabel STARTS WITH 'Mark4' + AND NOT dependentTypeLabel IN ['File', 'ByteCode', 'Java', 'Type'] + WITH dependentType, collect(dependentTypeLabel) AS dependentTypeLabels, numberOfUsingPackages +RETURN dependentType.fqn AS fullQualifiedDependentTypeName + ,dependentType.name AS dependentTypeName + ,dependentTypeLabels[0] + coalesce(' ' + dependentTypeLabels[1], '') AS dependentTypeLabel + ,numberOfUsingPackages + ORDER BY numberOfUsingPackages DESC, dependentTypeName ASC \ No newline at end of file diff --git a/cypher/Internal_Dependencies/NPM_Package_build_levels_for_graphviz.cypher b/domains/internal-dependencies/queries/internal-dependencies/NPM_Package_build_levels_for_graphviz.cypher similarity index 100% rename from cypher/Internal_Dependencies/NPM_Package_build_levels_for_graphviz.cypher rename to domains/internal-dependencies/queries/internal-dependencies/NPM_Package_build_levels_for_graphviz.cypher diff --git a/cypher/Internal_Dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher b/domains/internal-dependencies/queries/internal-dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher similarity index 100% rename from cypher/Internal_Dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher rename to domains/internal-dependencies/queries/internal-dependencies/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher diff --git a/cypher/Internal_Dependencies/Typescript_Module_build_levels_for_graphviz.cypher b/domains/internal-dependencies/queries/internal-dependencies/Typescript_Module_build_levels_for_graphviz.cypher similarity index 69% rename from cypher/Internal_Dependencies/Typescript_Module_build_levels_for_graphviz.cypher rename to domains/internal-dependencies/queries/internal-dependencies/Typescript_Module_build_levels_for_graphviz.cypher index 262067d47..5956aa98e 100644 --- a/cypher/Internal_Dependencies/Typescript_Module_build_levels_for_graphviz.cypher +++ b/domains/internal-dependencies/queries/internal-dependencies/Typescript_Module_build_levels_for_graphviz.cypher @@ -3,14 +3,15 @@ MATCH (sourceForStatistics:TS:Module)-[dependencyForStatistics:DEPENDS_ON]->(targetForStatistics:TS:Module) WHERE sourceForStatistics.maxDistanceFromSource IS NOT NULL AND targetForStatistics.maxDistanceFromSource IS NOT NULL - WITH min(dependencyForStatistics.weight) AS minWeight - ,max(dependencyForStatistics.weight) AS maxWeight + WITH min(dependencyForStatistics.cardinality) AS minCardinality + ,max(dependencyForStatistics.cardinality) AS maxCardinality ,max(targetForStatistics.maxDistanceFromSource) AS maxLevel + WITH *, CASE WHEN minCardinality = maxCardinality THEN maxCardinality + 1 ELSE maxCardinality END AS maxCardinality MATCH (source:TS:Module)-[dependency:DEPENDS_ON]->(target:TS:Module) WHERE source.maxDistanceFromSource IS NOT NULL AND target.maxDistanceFromSource IS NOT NULL - WITH *, toFloat(dependency.cardinality - minWeight) / toFloat(maxWeight - minWeight) AS normalizedWeight - WITH *, round((normalizedWeight * 5) + 1, 2) AS penWidth + WITH *, toFloat(dependency.cardinality - minCardinality) / toFloat(maxCardinality - minCardinality) AS normalizedCardinality + WITH *, round((normalizedCardinality * 5) + 1, 2) AS penWidth WITH *, "\\n(level " + coalesce(source.maxDistanceFromSource + "/" + maxLevel, "?") + ")" AS sourceLevelInfo WITH *, "\\n(level " + coalesce(target.maxDistanceFromSource + "/" + maxLevel, "?") + ")" AS targetLevelInfo WITH *, source.rootProjectName + "\\n" + source.name + sourceLevelInfo AS fullSourceName @@ -20,14 +21,14 @@ + " penwidth = " + penWidth + ";" + " ];" AS graphVizDotNotationEdge WITH *, "\"" + fullSourceName + coalesce(graphVizDotNotationEdge, "\" [];") AS graphVizDotNotationLine - ORDER BY dependency.weight DESC, target.maxDistanceFromSource DESC + ORDER BY dependency.cardinality DESC, target.maxDistanceFromSource DESC RETURN graphVizDotNotationLine //Debugging //,source.name AS sourceName //,target.name AS targetName //,penWidth - //,normalizedWeight - //,dependency.cardinality AS weight - //,minWeight - //,maxWeight + //,normalizedCardinality + //,dependency.cardinality AS cardinality + //,minCardinality + //,maxCardinality LIMIT 440 \ No newline at end of file diff --git a/cypher/Path_Finding/Path_Finding_1_Create_Projection.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_1_Create_Projection.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_1_Create_Projection.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_1_Create_Projection.cypher diff --git a/cypher/Path_Finding/Path_Finding_2_Estimate_Memory.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_2_Estimate_Memory.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_2_Estimate_Memory.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_2_Estimate_Memory.cypher diff --git a/cypher/Path_Finding/Path_Finding_3_Depth_First_Search_Path.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_3_Depth_First_Search_Path.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_3_Depth_First_Search_Path.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_3_Depth_First_Search_Path.cypher diff --git a/cypher/Path_Finding/Path_Finding_4_Breadth_First_Search_Path.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_4_Breadth_First_Search_Path.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_4_Breadth_First_Search_Path.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_4_Breadth_First_Search_Path.cypher diff --git a/cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher similarity index 96% rename from cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher index 985c84a78..53fb06104 100644 --- a/cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher +++ b/domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher @@ -14,4 +14,4 @@ RETURN distance ,count(DISTINCT sourceNodeId) AS sourceNodeCount ,count(DISTINCT targetNodeId) AS targetNodeCount ,collect(DISTINCT source.fileName + ' ->' + target.fileName)[0..2] AS examples -ORDER BY distance \ No newline at end of file +ORDER BY distance DESC \ No newline at end of file diff --git a/cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher diff --git a/cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_examples.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_examples.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_5_All_pairs_shortest_path_examples.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_5_All_pairs_shortest_path_examples.cypher diff --git a/cypher/Path_Finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher diff --git a/cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_overall.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_distribution_overall.cypher similarity index 62% rename from cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_overall.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_distribution_overall.cypher index 34ab731fb..5f288814f 100644 --- a/cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_overall.cypher +++ b/domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_distribution_overall.cypher @@ -2,6 +2,6 @@ CALL gds.dag.longestPath.stream($dependencies_projection + '-cleaned') YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path -RETURN toInteger(totalCost) AS totalCost - ,count(*) AS nodeCount -ORDER BY totalCost \ No newline at end of file +RETURN toInteger(totalCost) AS distance + ,count(*) AS pathsCount +ORDER BY distance DESC \ No newline at end of file diff --git a/cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_per_project.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_distribution_per_project.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_6_Longest_paths_distribution_per_project.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_distribution_per_project.cypher diff --git a/cypher/Path_Finding/Path_Finding_6_Longest_paths_examples.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_examples.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_6_Longest_paths_examples.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_examples.cypher diff --git a/cypher/Path_Finding/Path_Finding_6_Longest_paths_for_graphviz.cypher b/domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_for_graphviz.cypher similarity index 100% rename from cypher/Path_Finding/Path_Finding_6_Longest_paths_for_graphviz.cypher rename to domains/internal-dependencies/queries/path-finding/Path_Finding_6_Longest_paths_for_graphviz.cypher diff --git a/cypher/Path_Finding/Set_Parameters.cypher b/domains/internal-dependencies/queries/path-finding/Set_Parameters.cypher similarity index 95% rename from cypher/Path_Finding/Set_Parameters.cypher rename to domains/internal-dependencies/queries/path-finding/Set_Parameters.cypher index 55582f533..f6e2171ed 100644 --- a/cypher/Path_Finding/Set_Parameters.cypher +++ b/domains/internal-dependencies/queries/path-finding/Set_Parameters.cypher @@ -3,5 +3,5 @@ :params { "dependencies_projection": "package-path-finding", "dependencies_projection_node": "Package", - "dependencies_projection_weight_property": "weight25PercentInterfaces", + "dependencies_projection_weight_property": "weight25PercentInterfaces" } \ No newline at end of file diff --git a/cypher/Path_Finding/Set_Parameters_NonDevNpmPackage.cypher b/domains/internal-dependencies/queries/path-finding/Set_Parameters_NonDevNpmPackage.cypher similarity index 97% rename from cypher/Path_Finding/Set_Parameters_NonDevNpmPackage.cypher rename to domains/internal-dependencies/queries/path-finding/Set_Parameters_NonDevNpmPackage.cypher index e032d5534..ec24af307 100644 --- a/cypher/Path_Finding/Set_Parameters_NonDevNpmPackage.cypher +++ b/domains/internal-dependencies/queries/path-finding/Set_Parameters_NonDevNpmPackage.cypher @@ -4,5 +4,5 @@ "dependencies_projection_language":"NPM", "dependencies_projection": "npm-non-dev-package-path-finding", "dependencies_projection_node": "NpmNonDevPackage", - "dependencies_projection_weight_property": "weightByDependencyType", + "dependencies_projection_weight_property": "weightByDependencyType" } \ No newline at end of file diff --git a/cypher/Path_Finding/Set_Parameters_Typescript_Module.cypher b/domains/internal-dependencies/queries/path-finding/Set_Parameters_Typescript_Module.cypher similarity index 94% rename from cypher/Path_Finding/Set_Parameters_Typescript_Module.cypher rename to domains/internal-dependencies/queries/path-finding/Set_Parameters_Typescript_Module.cypher index 0d2ac9fc3..e7aef3bf2 100644 --- a/cypher/Path_Finding/Set_Parameters_Typescript_Module.cypher +++ b/domains/internal-dependencies/queries/path-finding/Set_Parameters_Typescript_Module.cypher @@ -4,5 +4,5 @@ "dependencies_projection_language":"Typescript", "dependencies_projection": "typescript-module-path-finding", "dependencies_projection_node": "Module", - "dependencies_projection_weight_property": "lowCouplingElement25PercentWeight", + "dependencies_projection_weight_property": "lowCouplingElement25PercentWeight" } \ No newline at end of file diff --git a/cypher/Topological_Sort/Set_Parameters.cypher b/domains/internal-dependencies/queries/topological-sort/Set_Parameters.cypher similarity index 100% rename from cypher/Topological_Sort/Set_Parameters.cypher rename to domains/internal-dependencies/queries/topological-sort/Set_Parameters.cypher diff --git a/domains/internal-dependencies/queries/topological-sort/Topological_Sort_Critical_Path_Length.cypher b/domains/internal-dependencies/queries/topological-sort/Topological_Sort_Critical_Path_Length.cypher new file mode 100644 index 000000000..54ed010fa --- /dev/null +++ b/domains/internal-dependencies/queries/topological-sort/Topological_Sort_Critical_Path_Length.cypher @@ -0,0 +1,26 @@ +// Critical path lengths (max build level) per abstraction level after topological sort. +// The maxDistanceFromSource property is set by the Topological Sort algorithm. +// Level 0 = no dependencies (can be built first). Higher level = more transitive dependents above it. +// The maximum level equals the minimum number of sequential build steps even with full parallelism. +// Needs graph-data-science plugin version >= 2.5.0 + +CALL { + MATCH (n:Artifact) WHERE n.maxDistanceFromSource IS NOT NULL + RETURN 'Java Artifact' AS abstractionLevel + ,max(n.maxDistanceFromSource) AS maxBuildLevel + ,count(n) AS nodeCount + UNION ALL + MATCH (n:Package) WHERE n.maxDistanceFromSource IS NOT NULL + RETURN 'Java Package' AS abstractionLevel + ,max(n.maxDistanceFromSource) AS maxBuildLevel + ,count(n) AS nodeCount + UNION ALL + MATCH (n:Module) WHERE n.maxDistanceFromSource IS NOT NULL + RETURN 'TypeScript Module' AS abstractionLevel + ,max(n.maxDistanceFromSource) AS maxBuildLevel + ,count(n) AS nodeCount +} +RETURN abstractionLevel + ,nodeCount + ,maxBuildLevel + ORDER BY abstractionLevel diff --git a/cypher/Topological_Sort/Topological_Sort_Exists.cypher b/domains/internal-dependencies/queries/topological-sort/Topological_Sort_Exists.cypher similarity index 100% rename from cypher/Topological_Sort/Topological_Sort_Exists.cypher rename to domains/internal-dependencies/queries/topological-sort/Topological_Sort_Exists.cypher diff --git a/cypher/Topological_Sort/Topological_Sort_List.cypher b/domains/internal-dependencies/queries/topological-sort/Topological_Sort_List.cypher similarity index 100% rename from cypher/Topological_Sort/Topological_Sort_List.cypher rename to domains/internal-dependencies/queries/topological-sort/Topological_Sort_List.cypher diff --git a/cypher/Topological_Sort/Topological_Sort_Query.cypher b/domains/internal-dependencies/queries/topological-sort/Topological_Sort_Query.cypher similarity index 100% rename from cypher/Topological_Sort/Topological_Sort_Query.cypher rename to domains/internal-dependencies/queries/topological-sort/Topological_Sort_Query.cypher diff --git a/cypher/Topological_Sort/Topological_Sort_Write.cypher b/domains/internal-dependencies/queries/topological-sort/Topological_Sort_Write.cypher similarity index 100% rename from cypher/Topological_Sort/Topological_Sort_Write.cypher rename to domains/internal-dependencies/queries/topological-sort/Topological_Sort_Write.cypher diff --git a/domains/internal-dependencies/summary/internalDependenciesSummary.sh b/domains/internal-dependencies/summary/internalDependenciesSummary.sh new file mode 100755 index 000000000..34f200674 --- /dev/null +++ b/domains/internal-dependencies/summary/internalDependenciesSummary.sh @@ -0,0 +1,410 @@ +#!/usr/bin/env bash + +# Creates a Markdown report summarising all internal dependency analysis results. +# It requires an already running Neo4j graph database with already scanned and analyzed artifacts. +# The results will be written into the sub directory reports/internal-dependencies. +# Dynamically triggered by "MarkdownReports.sh" via "internalDependenciesMarkdown.sh". + +# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. +# Note that either "internalDependenciesCsv.sh" or "internalDependenciesVisualization.sh" +# is required to run prior to this script. + +# Requires executeQueryFunctions.sh, cleanupAfterReportGeneration.sh + +# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) +set -o errexit -o pipefail + +# Overrideable Constants (defaults also defined in sub scripts) +REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} +MARKDOWN_INCLUDES_DIRECTORY=${MARKDOWN_INCLUDES_DIRECTORY:-"includes"} # Subdirectory that contains Markdown files to be included by the Markdown template for the report. + +## Get this "domains/internal-dependencies/summary" directory if not already set +# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. +# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. +# This way non-standard tools like readlink aren't needed. +INTERNAL_DEPENDENCIES_SUMMARY_DIR=${INTERNAL_DEPENDENCIES_SUMMARY_DIR:-$(CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)} +#echo "internalDependenciesSummary: INTERNAL_DEPENDENCIES_SUMMARY_DIR=${INTERNAL_DEPENDENCIES_SUMMARY_DIR}" + +# Get the "scripts" directory by navigating three levels up from this summary directory. +SCRIPTS_DIR=${SCRIPTS_DIR:-"${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/../../../scripts"} +MARKDOWN_SCRIPTS_DIR=${MARKDOWN_SCRIPTS_DIR:-"${SCRIPTS_DIR}/markdown"} + +# Cypher query directories within this domain +INTERNAL_DEPS_QUERY_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/../queries/internal-dependencies" +CYCLIC_DEPS_QUERY_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/../queries/cyclic-dependencies" +TOPOLOGICAL_SORT_SUMMARY_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/../queries/topological-sort" +PATH_FINDING_CYPHER_DIR="${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/../queries/path-finding" + +# Define functions to execute a cypher query from within a given file (first and only argument) like "execute_cypher" +source "${SCRIPTS_DIR}/executeQueryFunctions.sh" + +# Define functions to create and delete Graph Projections like "createDirectedDependencyProjection" or "projectionExists" +source "${SCRIPTS_DIR}/projectionFunctions.sh" + +# ── Front matter ────────────────────────────────────────────────────────────── + +internal_dependencies_front_matter() { + local current_date + current_date="$(date +'%Y-%m-%d')" + + local latest_tag + latest_tag="$(git for-each-ref --sort=-creatordate --count=1 --format '%(refname:short)' refs/tags)" + + local analysis_directory + analysis_directory="${PWD##*/}" + + echo "---" + echo "title: \"Internal Dependencies Report\"" + echo "generated: \"${current_date}\"" + echo "model_version: \"${latest_tag}\"" + echo "dataset: \"${analysis_directory}\"" + echo "authors: [\"JohT/code-graph-analysis-pipeline\"]" + echo "---" +} + +# ── SVG chart reference helpers ─────────────────────────────────────────────── + +# Emits a Markdown image reference for a chart SVG if the file exists, otherwise nothing. +include_svg_if_exists() { + local svg_file="${FULL_REPORT_DIRECTORY}/${1}" + local alt_text="${2}" + if [ -f "${svg_file}" ]; then + echo "" + echo "![${alt_text}](./${1})" + echo "" + fi +} + +# Emits Markdown image references for every SVG matching the given glob pattern, sorted. +include_svgs_matching() { + local base_dir="${1}" + local pattern="${2}" + [ -d "${base_dir}" ] || return 0 # if the base directory doesn't exist, just return without emitting anything + find "${base_dir}" -maxdepth 1 -type f -name "${pattern}" | sort | while read -r svg_file; do + local chart_filename + chart_filename=$(basename -- "${svg_file}") + local rel_path="${base_dir#"${FULL_REPORT_DIRECTORY}/"}/${chart_filename}" + local chart_label="${chart_filename%.*}" + echo "" + echo "![${chart_label}](./${rel_path})" + done +} + +# ── Report assembly helpers ─────────────────────────────────────────────────── + +# Limits a piped Markdown table to at most 10 data rows (header + separator kept in full). +limit_markdown_table() { + awk '/^\|[| :-]*-[| :-]*\|/ { sep=1; print; next } !sep { print } sep && ++rows <= 10 { print }' +} + +# Runs a Cypher query as a Markdown table (top 10 rows) and appends a CSV download link if the CSV exists. +# Arguments: +execute_limited_table() { + local cypher_file="${1}" + local csv_path="${2}" + local output_file="${3}" + { + execute_cypher "${cypher_file}" --output-markdown-table | limit_markdown_table + local full_csv="${FULL_REPORT_DIRECTORY}/${csv_path}" + if [ -f "${full_csv}" ]; then + echo "" + echo "[Full data](./${csv_path})" + fi + } > "${output_file}" +} + +# ── Report assembly ─────────────────────────────────────────────────────────── + +assemble_internal_dependencies_report() { + echo "internalDependenciesSummary: $(date +'%Y-%m-%dT%H:%M:%S%z') Assembling Markdown report..." + + local report_include_directory="${FULL_REPORT_DIRECTORY}/${MARKDOWN_INCLUDES_DIRECTORY}" + mkdir -p "${report_include_directory}" + + # -- Write front matter ------------------------------------------------ + internal_dependencies_front_matter > "${report_include_directory}/InternalDependenciesReportFrontMatter.md" + + # ── Java cyclic dependencies ─────────────────────────────────────────── + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies.cypher" \ + "Java_Package/Cyclic_Dependencies.csv" \ + "${report_include_directory}/Cyclic_Dependencies.md" + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies_Breakdown.cypher" \ + "Java_Package/Cyclic_Dependencies_Breakdown.csv" \ + "${report_include_directory}/Cyclic_Dependencies_Breakdown.md" + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_Backward_Only.cypher" \ + "Java_Package/Cyclic_Dependencies_Breakdown_Backward_Only.csv" \ + "${report_include_directory}/Cyclic_Dependencies_Breakdown_Backward_Only.md" + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher" \ + "Java_Artifact/CyclicArtifactDependenciesUnwinded.csv" \ + "${report_include_directory}/Cyclic_Dependencies_between_Artifacts.md" + + # ── TypeScript cyclic dependencies ──────────────────────────────────── + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies_for_Typescript.cypher" \ + "Typescript_Module/Cyclic_Dependencies_for_Typescript.csv" \ + "${report_include_directory}/Cyclic_Dependencies_for_Typescript.md" + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_for_Typescript.cypher" \ + "Typescript_Module/Cyclic_Dependencies_Breakdown_for_Typescript.csv" \ + "${report_include_directory}/Cyclic_Dependencies_Breakdown_for_Typescript.md" + + execute_limited_table \ + "${CYCLIC_DEPS_QUERY_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher" \ + "Typescript_Module/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.csv" \ + "${report_include_directory}/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.md" + + # ── Java internal structure ──────────────────────────────────────────── + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/List_all_Java_artifacts.cypher" \ + "Java_Artifact/List_all_Java_artifacts.csv" \ + "${report_include_directory}/List_all_Java_artifacts.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/Candidates_for_Interface_Segregation.cypher" \ + "Java_Package/InterfaceSegregationCandidates.csv" \ + "${report_include_directory}/Candidates_for_Interface_Segregation.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/List_types_that_are_used_by_many_different_packages.cypher" \ + "Java_Package/WidelyUsedTypes.csv" \ + "${report_include_directory}/List_types_that_are_used_by_many_different_packages.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher" \ + "Java_Artifact/ArtifactPackageUsage.csv" \ + "${report_include_directory}/How_many_packages_used_by_dependent_artifacts.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher" \ + "Java_Artifact/ClassesPerPackageUsageAcrossArtifacts.csv" \ + "${report_include_directory}/How_many_classes_used_by_dependent_packages.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher" \ + "Distance_distribution_between_dependent_files.csv" \ + "${report_include_directory}/File_distance_distribution.md" + + # ── TypeScript internal structure ────────────────────────────────────── + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/List_all_Typescript_modules.cypher" \ + "Typescript_Module/List_all_Typescript_modules.csv" \ + "${report_include_directory}/List_all_Typescript_modules.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher" \ + "Typescript_Module/WidelyUsedTypescriptElements.csv" \ + "${report_include_directory}/List_elements_used_by_many_modules.md" + + execute_limited_table \ + "${INTERNAL_DEPS_QUERY_CYPHER_DIR}/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher" \ + "Typescript_Module/ModuleElementsUsageTypescript.csv" \ + "${report_include_directory}/How_many_elements_used_by_dependent_modules.md" + + # ── Path finding tables (Java Artifact) ────────────────────────────────── + # Guard: only run if Java Artifact path finding projection exists. + ARTIFACT_PROJECTION="dependencies_projection=artifact-path-finding" + if projectionExists "${ARTIFACT_PROJECTION}"; then + { + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher" \ + "${ARTIFACT_PROJECTION}" \ + --output-markdown-table | limit_markdown_table + if [ -f "${FULL_REPORT_DIRECTORY}/Java_Artifact/Artifact_all_pairs_shortest_paths_distribution_per_project.csv" ]; then + echo "" + echo "[Full data per project](./Java_Artifact/Artifact_all_pairs_shortest_paths_distribution_per_project.csv)" + fi + } > "${report_include_directory}/JavaArtifactAllPairsShortestPathDistribution.md" + { + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_distribution_overall.cypher" \ + "${ARTIFACT_PROJECTION}" \ + --output-markdown-table | limit_markdown_table + if [ -f "${FULL_REPORT_DIRECTORY}/Java_Artifact/Artifact_longest_paths_distribution.csv" ]; then + echo "" + echo "[Full data per project](./Java_Artifact/Artifact_longest_paths_distribution.csv)" + fi + } > "${report_include_directory}/JavaArtifactLongestPathDistribution.md" + fi + + # ── Path finding tables (Java Package) ───────────────────────────────── + # Guard: only run if Java Package path finding projection exists. + PACKAGE_PROJECTION="dependencies_projection=package-path-finding" + if projectionExists "${PACKAGE_PROJECTION}"; then + { + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher" \ + "${PACKAGE_PROJECTION}" \ + --output-markdown-table | limit_markdown_table + if [ -f "${FULL_REPORT_DIRECTORY}/Java_Package/Package_all_pairs_shortest_paths_distribution_per_project.csv" ]; then + echo "" + echo "[Full data per project](./Java_Package/Package_all_pairs_shortest_paths_distribution_per_project.csv)" + fi + } > "${report_include_directory}/JavaPackageAllPairsShortestPathDistribution.md" + { + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_distribution_overall.cypher" \ + "${PACKAGE_PROJECTION}" \ + --output-markdown-table | limit_markdown_table + if [ -f "${FULL_REPORT_DIRECTORY}/Java_Package/Package_longest_paths_distribution.csv" ]; then + echo "" + echo "[Full data per project](./Java_Package/Package_longest_paths_distribution.csv)" + fi + } > "${report_include_directory}/JavaPackageLongestPathDistribution.md" + fi + + # ── Path finding SVG chart references (Java Package) ────────────────── + { + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_AllPairsShortestPath_Bar.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_AllPairsShortestPath_Pie.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_AllPairsShortestPath_StackedBar_Log.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_AllPairsShortestPath_StackedBar_Normalised.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_GraphDiameter_per_Project.svg" + } > "${report_include_directory}/JavaPackageAllPairsShortestPathCharts.md" + + { + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_LongestPath_Bar.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_LongestPath_Pie.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_LongestPath_StackedBar_Log.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_LongestPath_StackedBar_Normalised.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Package" "Java_Package_MaxLongestPath_per_Project.svg" + } > "${report_include_directory}/JavaPackageLongestPathCharts.md" + + # ── Path finding SVG chart references (Java Artifact) ───────────────── + { + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Artifact" "Java_Artifact_AllPairsShortestPath_Bar.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Artifact" "Java_Artifact_AllPairsShortestPath_Pie.svg" + } > "${report_include_directory}/JavaArtifactAllPairsShortestPathCharts.md" + + { + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Artifact" "Java_Artifact_LongestPath_Bar.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Java_Artifact" "Java_Artifact_LongestPath_Pie.svg" + } > "${report_include_directory}/JavaArtifactLongestPathCharts.md" + + # ── Path finding tables (TypeScript Module) ─────────────────────────────── + # Guard: only run if TypeScript path finding projection exists. + MODULE_PROJECTION="dependencies_projection=typescript-module-path-finding" + if projectionExists "${MODULE_PROJECTION}"; then + { + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_5_All_pairs_shortest_path_distribution_overall.cypher" \ + "${MODULE_PROJECTION}" \ + --output-markdown-table | limit_markdown_table + if [ -f "${FULL_REPORT_DIRECTORY}/Typescript_Module/Module_all_pairs_shortest_paths_distribution_per_project.csv" ]; then + echo "" + echo "[Full data per project](./Typescript_Module/Module_all_pairs_shortest_paths_distribution_per_project.csv)" + fi + } > "${report_include_directory}/TypescriptModuleAllPairsShortestPathDistribution.md" + { + execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_distribution_overall.cypher" \ + "${MODULE_PROJECTION}" \ + --output-markdown-table | limit_markdown_table + if [ -f "${FULL_REPORT_DIRECTORY}/Typescript_Module/Module_longest_paths_distribution.csv" ]; then + echo "" + echo "[Full data per project](./Typescript_Module/Module_longest_paths_distribution.csv)" + fi + } > "${report_include_directory}/TypescriptModuleLongestPathDistribution.md" + fi + + # ── Path finding SVG chart references (TypeScript Module) ───────────── + { + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_AllPairsShortestPath_Bar.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_AllPairsShortestPath_Pie.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_AllPairsShortestPath_StackedBar_Log.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_AllPairsShortestPath_StackedBar_Normalised.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_GraphDiameter_per_Project.svg" + } > "${report_include_directory}/TypescriptModuleAllPairsShortestPathCharts.md" + + { + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_LongestPath_Bar.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_LongestPath_Pie.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_LongestPath_StackedBar_Log.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_LongestPath_StackedBar_Normalised.svg" + include_svgs_matching "${FULL_REPORT_DIRECTORY}/Typescript_Module" "Typescript_Module_MaxLongestPath_per_Project.svg" + } > "${report_include_directory}/TypescriptModuleLongestPathCharts.md" + + # ── Graph visualization SVG references (Java Artifact) ──────────────── + { + include_svg_if_exists "Java_Artifact/Graph_Visualizations/JavaArtifactBuildLevels.svg" \ + "Java Artifact Build Levels" + include_svg_if_exists "Java_Artifact/Graph_Visualizations/JavaArtifactLongestPathsIsolated.svg" \ + "Java Artifact Longest Paths (Isolated)" + include_svg_if_exists "Java_Artifact/Graph_Visualizations/JavaArtifactLongestPaths.svg" \ + "Java Artifact Longest Paths (with contributors)" + } > "${report_include_directory}/JavaArtifactGraphVisualizations.md" + + # ── Graph visualization SVG references (TypeScript Module) ──────────── + { + include_svg_if_exists "Typescript_Module/Graph_Visualizations/TypeScriptModuleBuildLevels.svg" \ + "TypeScript Module Build Levels" + include_svg_if_exists "Typescript_Module/Graph_Visualizations/TypeScriptModuleLongestPathsIsolated.svg" \ + "TypeScript Module Longest Paths (Isolated)" + include_svg_if_exists "Typescript_Module/Graph_Visualizations/TypeScriptModuleLongestPaths.svg" \ + "TypeScript Module Longest Paths (with contributors)" + } > "${report_include_directory}/TypescriptModuleGraphVisualizations.md" + + # ── Graph visualization SVG references (NPM Packages) ───────────────── + { + include_svg_if_exists "NPM_NonDevPackage/Graph_Visualizations/NpmPackageBuildLevels.svg" \ + "NPM Package Build Levels" + include_svg_if_exists "NPM_NonDevPackage/Graph_Visualizations/NpmNonDevPackageLongestPathsIsolated.svg" \ + "NPM Non-Dev Package Longest Paths (Isolated)" + include_svg_if_exists "NPM_NonDevPackage/Graph_Visualizations/NpmNonDevPackageLongestPaths.svg" \ + "NPM Non-Dev Package Longest Paths (with contributors)" + include_svg_if_exists "NPM_DevPackage/Graph_Visualizations/NpmDevPackageLongestPathsIsolated.svg" \ + "NPM Dev Package Longest Paths (Isolated)" + include_svg_if_exists "NPM_DevPackage/Graph_Visualizations/NpmDevPackageLongestPaths.svg" \ + "NPM Dev Package Longest Paths (with contributors)" + } > "${report_include_directory}/NpmPackageGraphVisualizations.md" + + # ── Topological sort critical path length KPI ───────────────────────── + + execute_limited_table \ + "${TOPOLOGICAL_SORT_SUMMARY_CYPHER_DIR}/Topological_Sort_Critical_Path_Length.cypher" \ + "" \ + "${report_include_directory}/Topological_Sort_Critical_Path_Length.md" + + # -- Remove empty Markdown includes ------------------------------------ + source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${report_include_directory}" + + # -- Create fallback empty file for optional includes ------------------ + echo "" > "${report_include_directory}/empty.md" + + # -- Copy no-Java-data fallback template -------------------------- + cp -f "${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/report_no_java_data.template.md" \ + "${report_include_directory}/report_no_java_data.template.md" + + # -- Copy no-TypeScript-data fallback template -------------------------- + cp -f "${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/report_no_typescript_data.template.md" \ + "${report_include_directory}/report_no_typescript_data.template.md" + + # -- Copy no-cycles fallback template ---------------------------------- + cp -f "${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/report_no_cycles_data.template.md" \ + "${report_include_directory}/report_no_cycles_data.template.md" + + # -- Assemble final report from template -------------------------------- + cp -f "${INTERNAL_DEPENDENCIES_SUMMARY_DIR}/report.template.md" "${FULL_REPORT_DIRECTORY}/report.template.md" + cat "${FULL_REPORT_DIRECTORY}/report.template.md" \ + | "${MARKDOWN_SCRIPTS_DIR}/embedMarkdownIncludes.sh" "${report_include_directory}" \ + > "${FULL_REPORT_DIRECTORY}/internal_dependencies_report.md" + + rm -rf "${FULL_REPORT_DIRECTORY}/report.template.md" + rm -rf "${report_include_directory}" + + echo "internalDependenciesSummary: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." +} + +# ── Main ────────────────────────────────────────────────────────────────────── + +# Create report directory +REPORT_NAME="internal-dependencies" +FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" +mkdir -p "${FULL_REPORT_DIRECTORY}" + +assemble_internal_dependencies_report diff --git a/domains/internal-dependencies/summary/report.template.md b/domains/internal-dependencies/summary/report.template.md new file mode 100644 index 000000000..595258431 --- /dev/null +++ b/domains/internal-dependencies/summary/report.template.md @@ -0,0 +1,312 @@ + + +# 🔗 Internal Dependencies Report + +## 1. Executive Overview + +This report analyses how **internal packages, artifacts, TypeScript modules, and NPM packages depend on each other** within the codebase. It covers four interconnected topics: + +- **Cyclic dependencies** — mutual dependency cycles that complicate builds and impede refactoring +- **Internal structure** — interface segregation, widely used types, and usage ratios between modules +- **Path finding** — shortest and longest path distributions revealing dependency chain depth and complexity +- **Topological sort** — build ordering derived from the acyclic view of the dependency graph + +> **What to act on first?** +> Cyclic dependencies are the highest-priority structural smell — they make modular decomposition impossible. +> After eliminating cycles, use path finding results to identify and reduce excessive transitive depth. +> **Reading the tables**: Rows are sorted by priority — the **first rows are the most critical** and should be addressed first. + +## 📚 Table of Contents + +1. [Executive Overview](#1-executive-overview) +1. [Cyclic Dependencies](#2-cyclic-dependencies) +1. [Java Internal Structure](#3-java-internal-structure) +1. [TypeScript Internal Structure](#4-typescript-internal-structure) +1. [Path Finding](#5-path-finding) +1. [Topological Sort](#6-topological-sort) +1. [Graph Visualizations](#7-graph-visualizations) +1. [Glossary and Column Definitions](#8-glossary-and-column-definitions) + +--- + +## 2. Cyclic Dependencies + +A **cycle group** is a set of code units (packages, modules) that mutually depend on each other — A depends on B and B depends on A, directly or transitively. Cyclic dependencies prevent independent compilation, complicate testing, and make architectural layering impossible. + +The `forwardToBackwardBalance` metric identifies the easiest fix path: dependencies within a cycle are classified as _forward_ (in the direction of the cycle group majority) or _backward_ (against it). Removing or reversing a backward dependency dissolves the cycle. A balance near 1.0 means almost all dependencies are forward — there are very few backward ones to remove. A balance near 0.0 means many backward dependencies exist, making the cycle harder to resolve. + +See [Section 3.1](#31-java-artifact-listing) for artifact sizes and dependency counts — useful for gauging the impact of cycles found here. + +### 2.1 Java Package Cyclic Dependencies (Overview) + +Each row represents one cycle group. `numberForward` counts how many dependencies flow in the direction of the cycle; `numberBackward` counts those going against it. Sorted by `forwardToBackwardBalance` descending — the top rows are the easiest to fix (fewest backward dependencies to remove). + + + +### 2.2 Java Package Cyclic Dependencies (Breakdown) + +Expands each cycle group into individual dependency pairs in `type1 → type2` format, showing both forward and backward dependencies. Use this to understand the concrete classes involved in each cycle. + + + +### 2.3 Java Package Cyclic Dependencies (Backward Only) + +Shows only the _backward_ dependencies — those going against the majority flow within the cycle group. These are the highest-value candidates for removal or reversal to break the cycle entirely. + + + +### 2.4 Java Artifact Cyclic Dependencies + +Cyclic dependencies at the artifact (JAR) level — the coarsest and most critical abstraction. Each row is one dependency that participates in a cycle between artifacts. + + + +### 2.5 TypeScript Module Cyclic Dependencies (Overview) + +Cycle groups among TypeScript modules. Interpretation is identical to the Java package view: sorted by `forwardToBackwardBalance` descending, easiest fixes at the top. + + + +### 2.6 TypeScript Module Cyclic Dependencies (Breakdown) + +Individual dependency pairs within each TypeScript cycle group. + + + +### 2.7 TypeScript Module Cyclic Dependencies (Backward Only) + +Backward TypeScript module dependencies — the most effective candidates for breaking cycles. + + + +--- + +## 3. Java Internal Structure + +### 3.1 Java Artifact Listing + +All Java artifacts (JARs) sorted by their number of packages, types, and incoming/outgoing dependencies. Reveals the largest and most connected components in the build. Large incoming dependency counts indicate widely shared libraries; large outgoing counts indicate aggregator or application-level modules. + + + +### 3.2 Interface Segregation Principle Candidates + +Based on Robert C. Martin's **Interface Segregation Principle** — _"Clients should not be forced to depend upon interfaces that they do not use."_ + +This table lists Java interfaces that declare many public methods while one or more groups of callers invoke only a small subset. Each row is one such interface: how many methods it declares in total (including inherited), how many distinct callers actually use, the `usageRatio` (lower = stronger candidate), the number of distinct caller types, and the actual method names they call. Together these tell you precisely which focused sub-interface to extract. + +**Rows are sorted by priority — the first rows (lowest `usageRatio`, highest `callerCount`) are the most critical candidates.** + +Interpretation guidance: + +- `usageRatio` near 0 means callers use almost none of the API — strong extraction candidate. +- High `callerCount` with low `usageRatio` signals that extracting a small sub-interface benefits many callers (higher priority). +- `exampleCalledMethods` is the concrete set of methods to expose on the new interface. + + + + +### 3.3 Widely Used Java Types + +Types used by the highest number of _different_ packages. These are typically cross-cutting concerns, core domain objects, or shared utilities. A very high usage count means that changes to these types will ripple across many calling packages. + +**Rows are sorted by priority — the first rows (most widely used types) carry the highest risk for ripple effects.** + + + +### 3.4 Overly Broad Artifact Dependencies + +Identifies potentially unnecessary or over-broad artifact dependencies. For each artifact, shows which other artifacts depend on it and what percentage of its packages are actually used by those dependents. + +A low `usedPackagesPercent` indicates that a dependent artifact imports only a small fraction of the artifact's packages—suggesting the dependency might be unnecessarily broad. In such cases, the dependent artifact may only need to import a focused subset of the target artifact's packages, or might not need the dependency at all. + +**Rows are sorted by priority — the first rows (lowest `usedPackagesPercent`) are the best candidates for optimization or removal.** + + + +### 3.5 Class Usage Across Artifacts + +Classes that are used by multiple different artifacts — candidates for extraction into a shared library. High cross-artifact reuse indicates that a type has grown beyond its original module boundary. + +**Rows are sorted by priority — the first rows (most reused classes across artifacts) are the highest-value extraction candidates.** + + + +### 3.6 File Distance Distribution + +The **file distance** between a source file and the dependency it uses is the minimum number of `cd` (change directory) commands required to navigate from one to the other in the filesystem. It provides an intuitive measure of physical co-location: + +- **Distance 0**: Both files are in the same directory — tightly co-located. +- **Distance 1**: One directory apart — closely related. +- **Distance N**: N traversals required — the files are in very different parts of the source tree. + +High average distances indicate that architectural boundaries do not align with the directory structure. + + + +--- + +## 4. TypeScript Internal Structure + +### 4.1 TypeScript Module Listing + +All TypeScript modules sorted by their number of elements (functions, classes, interfaces), and incoming/outgoing dependency counts. Reveals the most central and most complex modules. + + + +### 4.2 Widely Used TypeScript Elements + +TypeScript elements (functions, classes, interfaces, type aliases) that are imported by the highest number of different modules. High usage counts indicate shared utilities or core domain concepts that many modules rely on. + +**Rows are sorted by priority — the first rows (most widely used elements) carry the highest risk for ripple effects.** + + + +### 4.3 Module Element Usage by Dependent Modules + +For each TypeScript module, which other modules import it and how many of its exposed elements do they actually use? A low `usedElementsPercent` signals that the module's public API is wider than what callers need — a candidate for splitting into a leaner interface. + +**Rows are sorted by priority — the first rows (lowest `usedElementsPercent`) are the strongest candidates for a narrower API.** + + + +--- + +## 5. Path Finding + +Path finding algorithms reveal the **depth and complexity of dependency chains**. + +- **All Pairs Shortest Path (APSP)**: For every connected pair of nodes, computes the minimum number of hops (direct dependency = distance 1, one intermediary = distance 2, etc.). The **graph diameter** — the longest shortest path — is the key complexity metric: a diameter of 6 means at least one pair of modules requires a chain of 6 transitive dependencies to connect. +- **Longest Path** (for directed acyclic graphs): The maximum-length directed path through the dependency graph. Relevant for build ordering — a node can only be built after all its transitive dependencies. A long longest path means a deep sequential build chain. + +> **Note on Longest Path**: The algorithm requires a Directed Acyclic Graph (DAG). Results are unreliable when cyclic dependencies exist. Eliminate cycles first for accurate results. + +### 5.1 Java Package Path Finding + +Dependency path analysis at the Java package level. Intra-artifact pairs (both source and target in the same artifact) are highlighted; intermediate paths may cross artifact boundaries, reflecting real-world transitive coupling. + +#### 5.1.1 All Pairs Shortest Path + +The graph diameter reveals the longest shortest path among all package pairs. Higher values indicate deeper transitive dependencies and more interconnectedness. + + + + + +#### 5.1.2 Longest Path + +The maximum-length path through the dependency graph shows the deepest sequential dependency chain. For Java packages, this represents the worst-case build order depth if dependencies cannot be parallelized. + + + + + +### 5.2 Java Artifact Path Finding + +Dependency path analysis at the artifact (JAR) level — the coarsest view. Useful for understanding build parallelism and maximum sequential build depth. + +#### 5.2.1 All Pairs Shortest Path + +The graph diameter at the artifact level. Artifact-level cycles are rare, so this metric is reliable for understanding transitive build dependencies. + + + + + +#### 5.2.2 Longest Path + +The longest dependency sequence at the artifact level—the critical path for sequential artifact building. + + + + + +### 5.3 TypeScript Module Path Finding + +Dependency path analysis for TypeScript modules. Comparable to the Java Package view; a long longest path indicates deep sequential import chains. + +#### 5.3.1 All Pairs Shortest Path + +The graph diameter reveals the longest shortest path among all TypeScript module pairs. Higher values indicate deeper transitive import dependencies. + + + + + +#### 5.3.2 Longest Path + +The maximum-length path through the dependency graph shows the deepest sequential import chain for TypeScript modules. + + + + + +--- + +## 6. Topological Sort + +Topological sorting assigns a **build level** to every node in the dependency graph: + +- **Level 0**: No dependencies — can be built first, in parallel with all other level-0 nodes. +- **Level N**: Depends on at least one node at level N−1. Must be built after all lower levels. + +The maximum level is the **critical path length** — the minimum number of sequential build steps even with full parallelism. Reducing this number (by removing unnecessary dependencies) directly speeds up builds. + +> **Interpretation of extremes**: The node at the **highest level** is the most central — all others (transitively) must be built before it. The node at **level 0** is the most peripheral — it depends on nothing. + +### 6.1 Critical Path Lengths + +The table below summarises the maximum build level (critical path length) and the total number of sorted nodes per abstraction level. A higher `maxBuildLevel` means a deeper mandatory sequential build chain. + + + +Full topological sort results (node-level build order and level assignments) are in the abstraction-level CSV files under each subdirectory of `reports/internal-dependencies/`. + +--- + +## 7. Graph Visualizations + +Directed dependency graphs showing build levels (node color = level) and longest path structures. +Each graph uses the topological sort level to color nodes: darker colors indicate higher levels (more transitive dependencies above them). + +### 7.1 Java Artifact Graphs + +**Build levels graph**: Directed graph of Java artifacts where node color corresponds to build level. Level-0 artifacts (no dependencies) appear in the lightest color. Reveals the full artifact dependency hierarchy. + +**Longest paths graphs**: Isolated view shows only the nodes and edges on the longest dependency chain. Contributor view adds all artifacts that feed into (contribute to) that chain. + + + +### 7.2 TypeScript Module Graphs + +**Build levels graph**: TypeScript module dependency graph, colored by topological sort level. Useful for understanding module layering and identifying circular or overly deep dependency chains. + +**Longest paths graphs**: Isolated and contributor views of the longest TypeScript dependency chain. + + + +### 7.3 NPM Package Graphs + +Build level and longest path graphs for NPM packages (both production and development dependencies). + + + +--- + +## 8. Glossary and Column Definitions + +| Term | Definition | +|------|-----------| +| `forwardToBackwardBalance` | Ratio of forward-to-total dependencies within a cycle group. Values near 1.0 = mostly forward (few backward dependencies to remove to break the cycle); near 0.0 = mostly backward (harder to fix). | +| `numberForward` | Count of dependencies flowing in the majority direction within a cycle group. | +| `numberBackward` | Count of dependencies flowing against the majority direction — primary refactoring targets. | +| `Graph Diameter` | The longest shortest path across all pairs in the dependency graph. A measure of structural depth and complexity. Higher values indicate more transitive coupling. | +| `Longest Path` | Maximum-length directed path through the DAG — the worst-case dependency chain and the critical build path. | +| `File Distance` | Minimum number of directory traversals between a source file and the file it depends on. Distance 0 = same directory; Distance N = N `cd` commands required. | +| `Build Level` | Topological sort level. Level 0 = no dependencies; Level N = depends on nodes at levels 0 through N−1. Minimum sequential build steps = max level + 1. | +| `usedTypesPercent` | Percentage of a package's types that dependent packages actually use. Low values indicate Interface Segregation violations. | +| `usedPackagesPercent` | Percentage of packages within an artifact that dependent artifacts actually import. Low values indicate wide API surfaces. | +| `usedElementsPercent` | Percentage of TypeScript module elements that dependent modules actually import. | +| `pairCount` | Number of node pairs at a given path distance. | +| `distanceTotalPairCount` | Total number of connected node pairs across all distances (used for normalisation). | +| `isDifferentTargetProject` | Whether the source and target node belong to different projects/artifacts. | diff --git a/domains/internal-dependencies/summary/report_no_cycles_data.template.md b/domains/internal-dependencies/summary/report_no_cycles_data.template.md new file mode 100644 index 000000000..2f933226b --- /dev/null +++ b/domains/internal-dependencies/summary/report_no_cycles_data.template.md @@ -0,0 +1 @@ +✅ _No cyclic dependencies detected — the dependency graph is acyclic for this abstraction level._ diff --git a/domains/internal-dependencies/summary/report_no_java_data.template.md b/domains/internal-dependencies/summary/report_no_java_data.template.md new file mode 100644 index 000000000..014d50b35 --- /dev/null +++ b/domains/internal-dependencies/summary/report_no_java_data.template.md @@ -0,0 +1 @@ +⚠️ _No data available — Java not detected in this codebase._ diff --git a/domains/internal-dependencies/summary/report_no_project_context.template.md b/domains/internal-dependencies/summary/report_no_project_context.template.md new file mode 100644 index 000000000..e55eb22d6 --- /dev/null +++ b/domains/internal-dependencies/summary/report_no_project_context.template.md @@ -0,0 +1 @@ +_No project context available. For best agent results, describe the project's purpose, architecture style (e.g. CQRS, microservices, monolith), and primary domain concepts in this section._ diff --git a/domains/internal-dependencies/summary/report_no_typescript_data.template.md b/domains/internal-dependencies/summary/report_no_typescript_data.template.md new file mode 100644 index 000000000..735d4e18a --- /dev/null +++ b/domains/internal-dependencies/summary/report_no_typescript_data.template.md @@ -0,0 +1 @@ +⚠️ _No data available — TypeScript not detected in this codebase._ diff --git a/jupyter/DependenciesGraphJava.ipynb b/jupyter/DependenciesGraphExplorationJava.ipynb similarity index 95% rename from jupyter/DependenciesGraphJava.ipynb rename to jupyter/DependenciesGraphExplorationJava.ipynb index e34bef668..2a2be55db 100644 --- a/jupyter/DependenciesGraphJava.ipynb +++ b/jupyter/DependenciesGraphExplorationJava.ipynb @@ -7,7 +7,7 @@ "source": [ "## Artifact Dependencies\n", "\n", - "This report includes graph visualization(s) using JavaScript and might not be exportable to some document formats.\n", + "This report shows graph visualization(s) using JavaScript and might not be exportable to some document formats and are meant for exploration and interactive use. The graph visualization is implemented using the [neovis.js](https://github.com/neo4j-contrib/neovis.js) library. Visualizations with GraphViz turned out to be more effective, so this is not used in the final report, but the code is still available in this Jupyter notebook.\n", "\n", "### References\n", "\n", @@ -287,7 +287,7 @@ "name": "JohT" } ], - "code_graph_analysis_pipeline_data_validation": "ValidateJavaArtifactDependencies", + "code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", diff --git a/jupyter/DependenciesGraphTypescript.ipynb b/jupyter/DependenciesGraphExplorationTypescript.ipynb similarity index 94% rename from jupyter/DependenciesGraphTypescript.ipynb rename to jupyter/DependenciesGraphExplorationTypescript.ipynb index d44fedc8a..6569d4f00 100644 --- a/jupyter/DependenciesGraphTypescript.ipynb +++ b/jupyter/DependenciesGraphExplorationTypescript.ipynb @@ -7,7 +7,7 @@ "source": [ "## Artifact Dependencies\n", "\n", - "This report includes graph visualization(s) using JavaScript and might not be exportable to some document formats.\n", + "This report shows graph visualization(s) using JavaScript and might not be exportable to some document formats and are meant for exploration and interactive use. The graph visualization is implemented using the [neovis.js](https://github.com/neo4j-contrib/neovis.js) library. Visualizations with GraphViz turned out to be more effective, so this is not used in the final report, but the code is still available in this Jupyter notebook.\n", "\n", "### References\n", "\n", @@ -289,7 +289,7 @@ "name": "JohT" } ], - "code_graph_analysis_pipeline_data_validation": "ValidateTypescriptModuleDependencies", + "code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", @@ -307,7 +307,7 @@ "pygments_lexer": "ipython3", "version": "3.11.9" }, - "title": "Neo4j Java Code-Structure Graph" + "title": "Neo4j Typescript Code-Structure Graph" }, "nbformat": 4, "nbformat_minor": 5 diff --git a/scripts/analysis/analyze.sh b/scripts/analysis/analyze.sh index 36d1e5955..2e505e39f 100755 --- a/scripts/analysis/analyze.sh +++ b/scripts/analysis/analyze.sh @@ -50,7 +50,7 @@ LOG_GROUP_END=${LOG_GROUP_END:-"::endgroup::"} # Prefix to end a log group. Defa # Function to display script usage usage() { - echo "Usage: $0 [--report ] [--profile ] [--domain ] [--explore]" + echo "Usage: $0 [--report ] [--profile ] [--domain ] [--explore] [--keep-running]" exit 1 } @@ -59,6 +59,7 @@ analysisReportCompilation="All" settingsProfile="Default" selectedAnalysisDomain="" exploreMode=false +keepRunning=false # Function to check if a parameter value is missing (either empty or another option starting with --) is_missing_value_parameter() { @@ -92,6 +93,10 @@ while [[ $# -gt 0 ]]; do exploreMode=true shift ;; + --keep-running) + keepRunning=true + shift + ;; --domain) if is_missing_value_parameter "$1" "$2"; then echo "analyze: Error: --domain requires a value." @@ -141,6 +146,12 @@ echo "analyze: analysisReportCompilation=${analysisReportCompilation}" echo "analyze: settingsProfile=${settingsProfile}" echo "analyze: selectedAnalysisDomain=${selectedAnalysisDomain}" echo "analyze: exploreMode=${exploreMode}" +echo "analyze: keepRunning=${keepRunning}" + +# Print warning if --explore and --keep-running are used together +if ${exploreMode} && ${keepRunning}; then + echo "analyze: Warning: --explore implies --keep-running. The --keep-running option is redundant." +fi ## Get this "scripts/analysis" directory if not already set # Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. @@ -217,7 +228,11 @@ fi ######################### source "${REPORT_COMPILATION_SCRIPT}" -# Stop Neo4j at the end +# Stop Neo4j at the end (unless --keep-running is set) echo "${LOG_GROUP_START}Finishing Analysis" -source "${SCRIPTS_DIR}/stopNeo4j.sh" +if ${keepRunning}; then + echo "analyze: Neo4j will keep running (--keep-running is set)." +else + source "${SCRIPTS_DIR}/stopNeo4j.sh" +fi echo "${LOG_GROUP_END}" \ No newline at end of file diff --git a/scripts/projectionFunctions.sh b/scripts/projectionFunctions.sh index 5a2af2b0d..bbf07cca1 100644 --- a/scripts/projectionFunctions.sh +++ b/scripts/projectionFunctions.sh @@ -112,6 +112,23 @@ verifyDataReadyForProjection() { fi } +# Checks if the projection already exists. +# Returns true (=0) if the projection exists. +# Returns false (=1) if the projection doesn't exist. +# Exits with an error if there are technical issues. +# Required Parameters: +# - dependencies_projection=... +# Name prefix for the in-memory projection name for dependencies. Example: "type-centrality" +projectionExists() { + local verificationResult + verificationResult=$( execute_cypher "${PROJECTION_CYPHER_DIR}/Dependencies_0_Check_Projection_Exists.cypher" "${@}") + if is_csv_column_greater_zero "${verificationResult}" "projectionCount"; then + true; + else + false; + fi +} + # Creates a directed Graph projection for dependencies between nodes specified by the parameter "dependencies_projection_node". # Nodes without incoming and outgoing dependencies will be filtered out using a subgraph. # diff --git a/scripts/reports/InternalDependenciesCsv.sh b/scripts/reports/InternalDependenciesCsv.sh deleted file mode 100755 index a958f8203..000000000 --- a/scripts/reports/InternalDependenciesCsv.sh +++ /dev/null @@ -1,77 +0,0 @@ -#!/usr/bin/env bash - -# Executes "Internal_Dependencies" Cypher queries to get the "internal-dependencies-csv" CSV reports. -# It contains lists of e.g. incoming and outgoing package dependencies, -# abstractness, instability and the distance to the so called "main sequence". - -# Requires executeQueryFunctions.sh, cleanupAfterReportGeneration.sh - -# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) -set -o errexit -o pipefail - -# Overrideable Constants (defaults also defined in sub scripts) -REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} - -## Get this "scripts/reports" directory if not already set -# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. -# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. -# This way non-standard tools like readlink aren't needed. -REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )} -echo "InternalDependenciesCsv: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}" - -# Get the "scripts" directory by taking the path of this script and going one directory up. -SCRIPTS_DIR=${SCRIPTS_DIR:-"${REPORTS_SCRIPT_DIR}/.."} # Repository directory containing the shell scripts -echo "InternalDependenciesCsv: SCRIPTS_DIR=${SCRIPTS_DIR}" - -# Get the "cypher" directory by taking the path of this script and going two directory up and then to "cypher". -CYPHER_DIR=${CYPHER_DIR:-"${REPORTS_SCRIPT_DIR}/../../cypher"} -echo "InternalDependenciesCsv: CYPHER_DIR=${CYPHER_DIR}" - -# Define functions to execute cypher queries from within a given file -source "${SCRIPTS_DIR}/executeQueryFunctions.sh" - -# Create report directory -REPORT_NAME="internal-dependencies-csv" -FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" -mkdir -p "${FULL_REPORT_DIRECTORY}" - -# Local Constants -CYCLIC_DEPENDENCIES_CYPHER_DIR="${CYPHER_DIR}/Cyclic_Dependencies" -INTERNAL_DEPENDENCIES_CYPHER_DIR="${CYPHER_DIR}/Internal_Dependencies" - -# Calculate the fewest number of change directory commands needed between dependent files as a distance metric -echo "InternalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Calculating distance between dependent files..." -execute_cypher_queries_until_results "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/Get_file_distance_as_shortest_contains_path_for_dependencies.cypher" \ - "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/Set_file_distance_as_shortest_contains_path_for_dependencies.cypher" \ - > "${FULL_REPORT_DIRECTORY}/Distance_distribution_between_dependent_files.csv" - -# Internal Dependencies for Java -echo "InternalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Processing internal dependencies for Java..." - -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies.cypher" > "${FULL_REPORT_DIRECTORY}/Cyclic_Dependencies.csv" -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_Breakdown.cypher" > "${FULL_REPORT_DIRECTORY}/Cyclic_Dependencies_Breakdown.csv" -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_Backward_Only.cypher" > "${FULL_REPORT_DIRECTORY}/Cyclic_Dependencies_Breakdown_Backward_Only.csv" -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_between_Artifacts_as_unwinded_List.cypher" > "${FULL_REPORT_DIRECTORY}/CyclicArtifactDependenciesUnwinded.csv" - -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/Candidates_for_Interface_Segregation.cypher" > "${FULL_REPORT_DIRECTORY}/InterfaceSegregationCandidates.csv" - -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/List_all_Java_artifacts.cypher" > "${FULL_REPORT_DIRECTORY}/List_all_Java_artifacts.csv" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/List_types_that_are_used_by_many_different_packages.cypher" > "${FULL_REPORT_DIRECTORY}/WidelyUsedTypes.csv" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/How_many_packages_compared_to_all_existing_are_used_by_dependent_artifacts.cypher" > "${FULL_REPORT_DIRECTORY}/ArtifactPackageUsage.csv" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/How_many_classes_compared_to_all_existing_in_the_same_package_are_used_by_dependent_packages_across_different_artifacts.cypher" > "${FULL_REPORT_DIRECTORY}/ClassesPerPackageUsageAcrossArtifacts.csv" - -# Internal Dependencies for TypeScript -echo "InternalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Processing internal dependencies for TypeScript..." - -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_for_Typescript.cypher" > "${FULL_REPORT_DIRECTORY}/Cyclic_Dependencies_for_Typescript.csv" -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_for_Typescript.cypher" > "${FULL_REPORT_DIRECTORY}/Cyclic_Dependencies_Breakdown_for_Typescript.csv" -execute_cypher "${CYCLIC_DEPENDENCIES_CYPHER_DIR}/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.cypher" > "${FULL_REPORT_DIRECTORY}/Cyclic_Dependencies_Breakdown_Backward_Only_for_Typescript.csv" - -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/List_all_Typescript_modules.cypher" > "${FULL_REPORT_DIRECTORY}/List_all_Typescript_modules.csv" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/List_elements_that_are_used_by_many_different_modules_for_Typescript.cypher" > "${FULL_REPORT_DIRECTORY}/WidelyUsedTypescriptElements.csv" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/How_many_elements_compared_to_all_existing_are_used_by_dependent_modules_for_Typescript.cypher" > "${FULL_REPORT_DIRECTORY}/ModuleElementsUsageTypescript.csv" - -# Clean-up after report generation. Empty reports will be deleted. -source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" - -echo "InternalDependenciesCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." \ No newline at end of file diff --git a/scripts/reports/InternalDependenciesVisualization.sh b/scripts/reports/InternalDependenciesVisualization.sh deleted file mode 100755 index f899e24db..000000000 --- a/scripts/reports/InternalDependenciesVisualization.sh +++ /dev/null @@ -1,61 +0,0 @@ -#!/usr/bin/env bash - -# Executes selected "Internal_Dependencies" Cypher queries for GraphViz visualization. -# Visualizes dependencies across artifacts and their build levels (topologically sorted). -# It requires an already running Neo4j graph database with already scanned and analyzed artifacts. -# The reports (csv, dot and svg files) will be written into the sub directory reports/internal-dependencies-visualization. - -# Requires executeQueryFunctions.sh, visualizeQueryResults.sh, cleanupAfterReportGeneration.sh - -# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) -set -o errexit -o pipefail - -# Overrideable Constants (defaults also defined in sub scripts) -REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} - -## Get this "scripts/reports" directory if not already set -# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. -# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. -# This way non-standard tools like readlink aren't needed. -REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )} -echo "InternalDependenciesVisualization: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}" - -# Get the "scripts" directory by taking the path of this script and going one directory up. -SCRIPTS_DIR=${SCRIPTS_DIR:-"${REPORTS_SCRIPT_DIR}/.."} # Repository directory containing the shell scripts -echo "InternalDependenciesVisualization SCRIPTS_DIR=${SCRIPTS_DIR}" - -# Get the "scripts/visualization" directory. -VISUALIZATION_SCRIPTS_DIR=${VISUALIZATION_SCRIPTS_DIR:-"${SCRIPTS_DIR}/visualization"} # Repository directory containing the shell scripts for visualization -echo "InternalDependenciesVisualization VISUALIZATION_SCRIPTS_DIR=${VISUALIZATION_SCRIPTS_DIR}" - -# Get the "cypher" directory by taking the path of this script and going two directory up and then to "cypher". -CYPHER_DIR=${CYPHER_DIR:-"${REPORTS_SCRIPT_DIR}/../../cypher"} -echo "InternalDependenciesVisualization CYPHER_DIR=${CYPHER_DIR}" - -INTERNAL_DEPENDENCIES_CYPHER_DIR="${CYPHER_DIR}/Internal_Dependencies" - -# Define functions to execute cypher queries from within a given file -source "${SCRIPTS_DIR}/executeQueryFunctions.sh" - -# Create report directory -REPORT_NAME="internal-dependencies-visualization" -FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" -mkdir -p "${FULL_REPORT_DIRECTORY}" - -# Java Artifacts: Dependencies Visualization -reportName="${FULL_REPORT_DIRECTORY}/JavaArtifactBuildLevels" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/Java_Artifact_build_levels_for_graphviz.cypher" > "${reportName}.csv" -source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${reportName}.csv" - -# TypeScript Modules: Dependencies Visualization -reportName="${FULL_REPORT_DIRECTORY}/TypeScriptModuleBuildLevels" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/Typescript_Module_build_levels_for_graphviz.cypher" > "${reportName}.csv" -source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${reportName}.csv" - -# NPM Packages: Dependencies Visualization -reportName="${FULL_REPORT_DIRECTORY}/NpmPackageBuildLevels" -execute_cypher "${INTERNAL_DEPENDENCIES_CYPHER_DIR}/NPM_Package_build_levels_for_graphviz.cypher" > "${reportName}.csv" -source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${reportName}.csv" - -# Clean-up after report generation. Empty reports will be deleted. -source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" \ No newline at end of file diff --git a/scripts/reports/PathFindingCsv.sh b/scripts/reports/PathFindingCsv.sh deleted file mode 100755 index d6b663fde..000000000 --- a/scripts/reports/PathFindingCsv.sh +++ /dev/null @@ -1,170 +0,0 @@ -#!/usr/bin/env bash - -# Uses path finding algorithms from the Graph Data Science Library of Neo4j and creates CSV reports. -# It requires an already running Neo4j graph database with already scanned and analyzed artifacts. -# The reports (csv files) will be written into the sub directory reports/path-finding-csv. -# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. - -# Requires executeQueryFunctions.sh, projectionFunctions.sh, cleanupAfterReportGeneration.sh - -# Overrideable Constants (defaults also defined in sub scripts) -REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} - -# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) -set -o errexit -o pipefail - -## Get this "scripts/reports" directory if not already set -# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. -# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. -# This way non-standard tools like readlink aren't needed. -REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )} -echo "pathFindingCsv: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}" - -# Get the "scripts" directory by taking the path of this script and going one directory up. -SCRIPTS_DIR=${SCRIPTS_DIR:-"${REPORTS_SCRIPT_DIR}/.."} # Repository directory containing the shell scripts -echo "pathFindingCsv: SCRIPTS_DIR=${SCRIPTS_DIR}" - -# Get the "cypher" directory by taking the path of this script and going two directory up and then to "cypher". -CYPHER_DIR=${CYPHER_DIR:-"${REPORTS_SCRIPT_DIR}/../../cypher"} -echo "pathFindingCsv: CYPHER_DIR=$CYPHER_DIR" - -# Define functions to execute a cypher query from within the given file (first and only argument) -source "${SCRIPTS_DIR}/executeQueryFunctions.sh" - -# Define functions to create and delete Graph Projections like "createDirectedDependencyProjection" -source "${SCRIPTS_DIR}/projectionFunctions.sh" - -# Create report directory -REPORT_NAME="path-finding-csv" -FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" -mkdir -p "${FULL_REPORT_DIRECTORY}" - -# Run the path finding algorithm "All Pairs Shortest Path". -# -# Required Parameters: -# - dependencies_projection=... -# Name prefix for the in-memory projection name for dependencies. Example: "type-path-finding" -# - dependencies_projection_node=... -# Label of the nodes that will be used for the projection. Example: "Type" -# - dependencies_projection_weight_property=... -# Name of the node property that contains the dependency weight. Example: "weight" -allPairsShortestPath() { - local PATH_FINDING_CYPHER_DIR="${CYPHER_DIR}/Path_Finding" - local nodeLabel; nodeLabel=$( extractQueryParameter "dependencies_projection_node" "${@}" ) - - # Run the algorithm using "stream" and write the results into a CSV file - execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_5_All_pairs_shortest_path_distribution_per_project.cypher" "${@}" > "${FULL_REPORT_DIRECTORY}/${nodeLabel}_all_pairs_shortest_paths_distribution_per_project.csv" -} - -# Run the path finding algorithm "Longest Path" (for directed acyclic graphs (DAG)). -# -# Required Parameters: -# - dependencies_projection=... -# Name prefix for the in-memory projection name for dependencies. Example: "type-path-finding" -# - dependencies_projection_node=... -# Label of the nodes that will be used for the projection. Example: "Type" -# - dependencies_projection_weight_property=... -# Name of the node property that contains the dependency weight. Example: "weight" -longestPath() { - local PATH_FINDING_CYPHER_DIR="${CYPHER_DIR}/Path_Finding" - local nodeLabel; nodeLabel=$( extractQueryParameter "dependencies_projection_node" "${@}" ) - - # Run the algorithm using "stream" and write the results into a CSV file - execute_cypher "${PATH_FINDING_CYPHER_DIR}/Path_Finding_6_Longest_paths_distribution_per_project.cypher" "${@}" > "${FULL_REPORT_DIRECTORY}/${nodeLabel}_longest_paths_distribution.csv" -} - -# Run all contained path finding algorithms. -# -# Required Parameters: -# - dependencies_projection=... -# Name prefix for the in-memory projection name for dependencies. Example: "artifact-path-finding" -# - dependencies_projection_node=... -# Label of the nodes that will be used for the projection. Example: "Artifact" -# - dependencies_projection_weight_property=... -# Name of the node property that contains the dependency weight. Example: "weight" -runPathFindingAlgorithms() { - time allPairsShortestPath "${@}" - time longestPath "${@}" -} - -# -- Java Artifact Path Finding ------------------------------------ - -ARTIFACT_PROJECTION="dependencies_projection=artifact-path-finding" -ARTIFACT_NODE="dependencies_projection_node=Artifact" -ARTIFACT_WEIGHT="dependencies_projection_weight_property=weight" - -if createDirectedDependencyProjection "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}"; then - runPathFindingAlgorithms "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" -fi - -# -- Java Package Path Finding ------------------------------------- - -PACKAGE_PROJECTION="dependencies_projection=package-path-finding" -PACKAGE_NODE="dependencies_projection_node=Package" -PACKAGE_WEIGHT="dependencies_projection_weight_property=weight25PercentInterfaces" - -if createDirectedDependencyProjection "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}"; then - runPathFindingAlgorithms "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}" -fi - -# -- Java Type Path Finding ---------------------------------------- -# Note: This is deactivated for now. It might be too granular to be valuable and require too many resources, - -#TYPE_PROJECTION="dependencies_projection=type-path-finding" -#TYPE_NODE="dependencies_projection_node=Type" -#TYPE_WEIGHT="dependencies_projection_weight_property=weight" -# -#if createDirectedJavaTypeDependencyProjection "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}"; then -# runPathFindingAlgorithms "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}" -#fi - -# -- Java Method Path Finding -------------------------------------- -# Note: This is deactivated for now. It might be too granular to be valuable and require too many resources, - -#METHOD_PROJECTION="dependencies_projection=method-path-finding" -#METHOD_NODE="dependencies_projection_node=Method" -#METHOD_WEIGHT="dependencies_projection_weight_property=" - -#if createDirectedJavaMethodDependencyProjection "${METHOD_PROJECTION}"; then -# runPathFindingAlgorithms "${METHOD_PROJECTION}" "${METHOD_NODE}" "${METHOD_WEIGHT}" -#fi - -# -- Typescript Modules Path Finding ------------------------------- - -MODULE_LANGUAGE="dependencies_projection_language=Typescript" -MODULE_PROJECTION="dependencies_projection=typescript-module-path-finding" -MODULE_NODE="dependencies_projection_node=Module" -MODULE_WEIGHT="dependencies_projection_weight_property=lowCouplingElement25PercentWeight" - -if createDirectedDependencyProjection "${MODULE_LANGUAGE}" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}"; then - runPathFindingAlgorithms "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" -fi - -# -- Non Dev NPM Package Path Finding ------------------------------- - -NPM_LANGUAGE="dependencies_projection_language=NPM" -NPM_PROJECTION="dependencies_projection=npm-non-dev-package-path-finding" -NPM_NODE="dependencies_projection_node=NpmNonDevPackage" -NPM_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" - -if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}"; then - runPathFindingAlgorithms "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" -fi - -# -- Dev NPM Package Path Finding ------------------------------- - -NPM_DEV_LANGUAGE="dependencies_projection_language=NPM" -NPM_DEV_PROJECTION="dependencies_projection=npm-dev-package-path-finding" -NPM_DEV_NODE="dependencies_projection_node=NpmDevPackage" -NPM_DEV_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" - -if createDirectedDependencyProjection "${NPM_DEV_LANGUAGE}" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}"; then - runPathFindingAlgorithms "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" -fi - -# --------------------------------------------------------------- - -# Clean-up after report generation. Empty reports will be deleted. -source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" - -echo "pathFindingCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." \ No newline at end of file diff --git a/scripts/reports/PathFindingVisualization.sh b/scripts/reports/PathFindingVisualization.sh deleted file mode 100755 index 22deb903c..000000000 --- a/scripts/reports/PathFindingVisualization.sh +++ /dev/null @@ -1,139 +0,0 @@ -#!/usr/bin/env bash - -# Executes selected "Path_Finding" Cypher queries for GraphViz visualization. -# Visualizes Java Artifact, TypeScript Module and NPM Package dependencies with their longest paths. -# -# It requires an already running Neo4j graph database with already scanned and analyzed artifacts. -# The reports (csv, dot and svg files) will be written into the sub directory reports/path-finding-visualization. - -# Requires executeQueryFunctions.sh, projectionFunctions.sh, visualizeQueryResults.sh, cleanupAfterReportGeneration.sh - -# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) -set -o errexit -o pipefail - -# Overrideable Constants (defaults also defined in sub scripts) -REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} -SCRIPT_NAME="PathFindingVisualization" -## Get this "scripts/reports" directory if not already set -# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. -# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. -# This way non-standard tools like readlink aren't needed. -REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )} -echo "${SCRIPT_NAME}: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}" - -# Get the "scripts" directory by taking the path of this script and going one directory up. -SCRIPTS_DIR=${SCRIPTS_DIR:-"${REPORTS_SCRIPT_DIR}/.."} # Repository directory containing the shell scripts -echo "${SCRIPT_NAME}: SCRIPTS_DIR=${SCRIPTS_DIR}" - -# Get the "scripts/visualization" directory. -VISUALIZATION_SCRIPTS_DIR=${VISUALIZATION_SCRIPTS_DIR:-"${SCRIPTS_DIR}/visualization"} # Repository directory containing the shell scripts for visualization -echo "${SCRIPT_NAME}: VISUALIZATION_SCRIPTS_DIR=${VISUALIZATION_SCRIPTS_DIR}" - -# Get the "cypher" directory by taking the path of this script and going two directory up and then to "cypher". -CYPHER_DIR=${CYPHER_DIR:-"${REPORTS_SCRIPT_DIR}/../../cypher"} -echo "${SCRIPT_NAME}: CYPHER_DIR=${CYPHER_DIR}" - -PATH_FINDINGS_CYPHER_DIR="${CYPHER_DIR}/Path_Finding" -TOPOLOGICAL_SORT_CYPHER_DIR="${CYPHER_DIR}/Topological_Sort" - -# Define functions to execute cypher queries from within a given file like execute_cypher and execute_cypher_queries_until_results -source "${SCRIPTS_DIR}/executeQueryFunctions.sh" - -# Define functions to create and delete Graph Projections like "createDirectedDependencyProjection" -source "${SCRIPTS_DIR}/projectionFunctions.sh" - -# Create report directory -REPORT_NAME="path-finding-visualization" -FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" -mkdir -p "${FULL_REPORT_DIRECTORY}" - -# Java Artifacts: Longest Paths Visualization -ARTIFACT_PROJECTION="dependencies_projection=artifact-path-finding" -ARTIFACT_NODE="dependencies_projection_node=Artifact" -ARTIFACT_WEIGHT="dependencies_projection_weight_property=weight" - -if createDirectedDependencyProjection "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}"; then - # Determines topological sort max distance from source if not already done for level info in visualization. - execute_cypher_queries_until_results "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ - "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" - - reportName="JavaArtifactLongestPathsIsolated" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - - reportName="JavaArtifactLongestPaths" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" -fi - -# TypeScript Modules: Longest Paths Visualization -MODULE_LANGUAGE="dependencies_projection_language=Typescript" -MODULE_PROJECTION="dependencies_projection=typescript-module-path-finding" -MODULE_NODE="dependencies_projection_node=Module" -MODULE_WEIGHT="dependencies_projection_weight_property=lowCouplingElement25PercentWeight" - -if createDirectedDependencyProjection "${MODULE_LANGUAGE}" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}"; then - # Determines topological sort max distance from source if not already done for level info in visualization. - execute_cypher_queries_until_results "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ - "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${MODULE_PROJECTION}" "${MODULE_NODE}" - - reportName="TypeScriptModuleLongestPathsIsolated" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - - reportName="TypeScriptModuleLongestPaths" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" -fi - -# Non Dev NPM Packages: Longest Paths Visualization -NPM_LANGUAGE="dependencies_projection_language=NPM" -NPM_PROJECTION="dependencies_projection=npm-non-dev-package-path-finding" -NPM_NODE="dependencies_projection_node=NpmNonDevPackage" -NPM_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" - -if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}"; then - # Determines topological sort max distance from source if not already done for level info in visualization. - execute_cypher_queries_until_results "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ - "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${NPM_PROJECTION}" "${NPM_NODE}" - - reportName="NpmNonDevPackageLongestPathsIsolated" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - - reportName="NpmNonDevPackageLongestPaths" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" -fi - -# Dev NPM Packages: Longest Paths Visualization - -NPM_DEV_LANGUAGE="dependencies_projection_language=NPM" -NPM_DEV_PROJECTION="dependencies_projection=npm-dev-package-path-finding" -NPM_DEV_NODE="dependencies_projection_node=NpmDevPackage" -NPM_DEV_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" - -if createDirectedDependencyProjection "${NPM_DEV_LANGUAGE}" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}"; then - # Determines topological sort max distance from source if not already done for level info in visualization. - execute_cypher_queries_until_results "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Exists.cypher" \ - "${TOPOLOGICAL_SORT_CYPHER_DIR}/Topological_Sort_Write.cypher" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" - - reportName="NpmDevPackageLongestPathsIsolated" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_for_graphviz.cypher" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - - reportName="NpmDevPackageLongestPaths" - echo "${SCRIPT_NAME}: Creating visualization ${reportName}..." - execute_cypher "${PATH_FINDINGS_CYPHER_DIR}/Path_Finding_6_Longest_paths_contributors_for_graphviz.cypher" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" > "${FULL_REPORT_DIRECTORY}/${reportName}.csv" - source "${VISUALIZATION_SCRIPTS_DIR}/visualizeQueryResults.sh" "${FULL_REPORT_DIRECTORY}/${reportName}.csv" -fi - -# Clean-up after report generation. Empty reports will be deleted. -source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" \ No newline at end of file diff --git a/scripts/reports/TopologicalSortCsv.sh b/scripts/reports/TopologicalSortCsv.sh deleted file mode 100755 index 400b0b1f0..000000000 --- a/scripts/reports/TopologicalSortCsv.sh +++ /dev/null @@ -1,131 +0,0 @@ -#!/usr/bin/env bash - -# Applies the Topological Sorting algorithm for directed acyclic graphs (DAG) to order code units by their dependencies -# using Graph Data Science Library of Neo4j and creates CSV reports. -# This is useful to get the build order and build levels for modules that depend on each other. -# It requires an already running Neo4j graph database with already scanned and analyzed artifacts. -# The reports (csv files) will be written into the sub directory reports/topology-csv. -# Note that "scripts/prepareAnalysis.sh" is required to run prior to this script. - -# Requires executeQueryFunctions.sh, projectionFunctions.sh, cleanupAfterReportGeneration.sh - -# Fail on any error ("-e" = exit on first error, "-o pipefail" exist on errors within piped commands) -set -o errexit -o pipefail - -# Overrideable constants (defaults also defined in sub scripts) -REPORTS_DIRECTORY=${REPORTS_DIRECTORY:-"reports"} - -## Get this "scripts/reports" directory if not already set -# Even if $BASH_SOURCE is made for Bourne-like shells it is also supported by others and therefore here the preferred solution. -# CDPATH reduces the scope of the cd command to potentially prevent unintended directory changes. -# This way non-standard tools like readlink aren't needed. -REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR:-$( CDPATH=. cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P )} -echo "topologicalSortCsv: REPORTS_SCRIPT_DIR=${REPORTS_SCRIPT_DIR}" - -# Get the "scripts" directory by taking the path of this script and going one directory up. -SCRIPTS_DIR=${SCRIPTS_DIR:-"${REPORTS_SCRIPT_DIR}/.."} # Repository directory containing the shell scripts -echo "topologicalSortCsv: SCRIPTS_DIR=${SCRIPTS_DIR}" - -# Get the "cypher" directory by taking the path of this script and going two directory up and then to "cypher". -CYPHER_DIR=${CYPHER_DIR:-"${REPORTS_SCRIPT_DIR}/../../cypher"} -echo "topologicalSortCsv: CYPHER_DIR=$CYPHER_DIR" - -# Define functions to execute a cypher query from within the given file (first and only argument) -source "${SCRIPTS_DIR}/executeQueryFunctions.sh" - -# Define functions to create and delete Graph Projections like "createDirectedDependencyProjection" -source "${SCRIPTS_DIR}/projectionFunctions.sh" - -# Create report directory -REPORT_NAME="topology-csv" -FULL_REPORT_DIRECTORY="${REPORTS_DIRECTORY}/${REPORT_NAME}" -mkdir -p "${FULL_REPORT_DIRECTORY}" - -# Apply the algorithm "Topological Sort". -# -# Required Parameters: -# - dependencies_projection=... -# Name prefix for the in-memory projection name for dependencies. Example: "package" -# - dependencies_projection_node=... -# Label of the nodes that will be used for the projection. Example: "Package" -# - dependencies_projection_weight_property=... -# Name of the node property that contains the dependency weight. Example: "weight" -topologicalSort() { - local TOPOLOGICAL_SORT_DIR="$CYPHER_DIR/Topological_Sort" - - # Update Graph (node properties) - execute_cypher "${TOPOLOGICAL_SORT_DIR}/Topological_Sort_Write.cypher" "${@}" - - # Stream to CSV - local nodeLabel - nodeLabel=$( extractQueryParameter "dependencies_projection_node" "${@}" ) - execute_cypher "${TOPOLOGICAL_SORT_DIR}/Topological_Sort_Query.cypher" "${@}" > "${FULL_REPORT_DIRECTORY}/${nodeLabel}_Topological_Sort.csv" -} - -# -- Java Artifact Topology -------------------------------------- - -ARTIFACT_PROJECTION="dependencies_projection=artifact-topology" -ARTIFACT_NODE="dependencies_projection_node=Artifact" -ARTIFACT_WEIGHT="dependencies_projection_weight_property=weight" - -if createDirectedDependencyProjection "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}"; then - time topologicalSort "${ARTIFACT_PROJECTION}" "${ARTIFACT_NODE}" "${ARTIFACT_WEIGHT}" -fi - -# -- Java Package Topology --------------------------------------- - -PACKAGE_PROJECTION="dependencies_projection=package-topology" -PACKAGE_NODE="dependencies_projection_node=Package" -PACKAGE_WEIGHT="dependencies_projection_weight_property=weight25PercentInterfaces" - -if createDirectedDependencyProjection "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}"; then - time topologicalSort "${PACKAGE_PROJECTION}" "${PACKAGE_NODE}" "${PACKAGE_WEIGHT}" -fi - -# -- Java Type Topology ------------------------------------------ - -TYPE_PROJECTION="dependencies_projection=type-topology" -TYPE_NODE="dependencies_projection_node=Type" -TYPE_WEIGHT="dependencies_projection_weight_property=weight" - -if createDirectedJavaTypeDependencyProjection "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}"; then - time topologicalSort "${TYPE_PROJECTION}" "${TYPE_NODE}" "${TYPE_WEIGHT}" -fi - -# -- Typescript Module Topology --------------------------------------- - -MODULE_LANGUAGE="dependencies_projection_language=Typescript" -MODULE_PROJECTION="dependencies_projection=typescript-module-topology" -MODULE_NODE="dependencies_projection_node=Module" -MODULE_WEIGHT="dependencies_projection_weight_property=lowCouplingElement25PercentWeight" - -if createDirectedDependencyProjection "${MODULE_LANGUAGE}" "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}"; then - time topologicalSort "${MODULE_PROJECTION}" "${MODULE_NODE}" "${MODULE_WEIGHT}" -fi - -# -- Non Dev NPM Package Topology --------------------------------------- - -NPM_LANGUAGE="dependencies_projection_language=NPM" -NPM_PROJECTION="dependencies_projection=npm-non-dev-package-topology" -NPM_NODE="dependencies_projection_node=NpmNonDevPackage" -NPM_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" - -if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}"; then - time topologicalSort "${NPM_PROJECTION}" "${NPM_NODE}" "${NPM_WEIGHT}" -fi - -# -- Dev NPM Package Topology -------------------------------------------- - -NPM_DEV_PROJECTION="dependencies_projection=npm-dev-package-topology" -NPM_DEV_NODE="dependencies_projection_node=NpmDevPackage" -NPM_DEV_WEIGHT="dependencies_projection_weight_property=weightByDependencyType" - -if createDirectedDependencyProjection "${NPM_LANGUAGE}" "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}"; then - time topologicalSort "${NPM_DEV_PROJECTION}" "${NPM_DEV_NODE}" "${NPM_DEV_WEIGHT}" -fi -# ---------------------------------------------------------------------- - -# Clean-up after report generation. Empty reports will be deleted. -source "${SCRIPTS_DIR}/cleanupAfterReportGeneration.sh" "${FULL_REPORT_DIRECTORY}" - -echo "topologicalSortCsv: $(date +'%Y-%m-%dT%H:%M:%S%z') Successfully finished." \ No newline at end of file diff --git a/scripts/stopNeo4j.sh b/scripts/stopNeo4j.sh index 290e61833..2b291f77b 100755 --- a/scripts/stopNeo4j.sh +++ b/scripts/stopNeo4j.sh @@ -51,15 +51,15 @@ else exit 1 fi -# Include operation system function to for example detect Windows. +# Include operation system function to for example detect Windows like isWindows. source "${SCRIPTS_DIR}/operatingSystemFunctions.sh" -# Include functions to check or wait for the database to be ready +# Include functions to check or wait for the database to be ready like isDatabaseQueryable. source "${SCRIPTS_DIR}/waitForNeo4jHttpFunctions.sh" # Check if Neo4j is stopped (not running) using a temporary NEO4J_HOME environment variable that points to the current installation isDatabaseReady=$(isDatabaseQueryable) -if [[ ${isDatabaseReady} == "false" ]]; then +if [ "${isDatabaseReady}" = "false" ]; then echo "stopNeo4j: ${neo4j_directory} already stopped" exit 0 else @@ -71,9 +71,9 @@ else fi fi -# Check if Neo4j is still not running using a temporary NEO4J_HOME environment variable that points to the current installation +#Check if Neo4j is still not running using a temporary NEO4J_HOME environment variable that points to the current installation isDatabaseReady=$(isDatabaseQueryable) -if [[ ${isDatabaseReady} == "false" ]]; then +if [ "${isDatabaseReady}" = "false" ]; then echo "stopNeo4j: Successfully stopped ${neo4j_directory}" else if ! isWindows; then @@ -86,11 +86,24 @@ fi if isWindows; then echo "stopNeo4j: Skipping detection of processes listening to port ${NEO4J_HTTP_PORT} on Windows" else - port_listener_process_id=$( lsof -t -i:"${NEO4J_HTTP_PORT}" -sTCP:LISTEN || true ) - if [ -n "${port_listener_process_id}" ]; then - echo "stopNeo4j: Terminating the following process that still listens to port ${NEO4J_HTTP_PORT}" - ps -p "${port_listener_process_id}" - # Terminate the process that is listening to the Neo4j HTTP port - kill -9 "${port_listener_process_id}" + port_listener_process_ids=$( lsof -t -i:"${NEO4J_HTTP_PORT}" -sTCP:LISTEN 2>/dev/null || true ) + if [ -n "${port_listener_process_ids}" ]; then + echo "stopNeo4j: Gracefully terminating process(es) listening to port ${NEO4J_HTTP_PORT}" + # Display process info + echo "${port_listener_process_ids}" | tr '\n' ',' | sed 's/,$//' | xargs -I {} ps -p {} || true + # Try graceful shutdown first with SIGTERM on each PID + echo "${port_listener_process_ids}" | while read -r pid; do + [ -n "${pid}" ] && kill -TERM "${pid}" 2>/dev/null || true + done + sleep 20 + # Check if process(es) are still alive and force kill if necessary + echo "${port_listener_process_ids}" | while read -r pid; do + [ -n "${pid}" ] && kill -0 "${pid}" 2>/dev/null && { + echo "stopNeo4j: Process ${pid} still alive after SIGTERM — sending SIGKILL" + kill -KILL "${pid}" 2>/dev/null || true + } || true + done + else + echo "stopNeo4j: No processes listening to port ${NEO4J_HTTP_PORT} detected. Assuming Neo4j is stopped." fi fi \ No newline at end of file