Skip to content

Commit eaf0970

Browse files
committed
Introduce internal-dependencies domain
1 parent 787cb2e commit eaf0970

64 files changed

Lines changed: 8056 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
## Plan: Create External Dependencies Domain
2+
3+
Create a new self-contained `domains/external-dependencies/` domain following the `anomaly-detection` reference pattern. Copy all 35 Cypher queries, the existing CSV shell script, convert both Jupyter notebooks (Java + TypeScript) to a Python SVG chart generator, move the notebooks to an `explore/` folder with validation disabled, and assemble a Markdown summary report optimized for human and AI agent consumption. No moves or deletions of originals. No graph visualizations yet.
4+
5+
### Decisions
6+
7+
- **ExternalDependenciesCsv.sh**: Keep both (original in [scripts/reports/ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) unchanged, copy in domain)
8+
- **Cypher files**: Copy all 35 into domain for self-containment
9+
- **Charts**: ALL charts from both notebooks converted to Python SVG output (~38 total)
10+
- **Package.json queries**: Included in domain
11+
- **Reset folder**: Not included (external type labels are a central enrichment prerequisite)
12+
- **Markdown summary**: Rich, AI-agent-actionable descriptions with architectural guidance
13+
- **Entry point naming**: camelCase prefix (`externalDependenciesCsv.sh`, etc.)
14+
15+
### Prerequisites (Documented in README, Not Copied)
16+
17+
The following are provided by the central pipeline and must run *before* this domain:
18+
19+
1. **Neo4j running** with scanned artifacts loaded
20+
2. **DEPENDS_ON relationships** between types (jQAssistant scan)
21+
3. **Type labels** ([cypher/Types/](cypher/Types/)): base Java types, built-in types, resolved duplicates
22+
4. **Weight properties** on DEPENDS_ON ([cypher/DependsOn_Relationship_Weights/](cypher/DependsOn_Relationship_Weights/)): `weight`, `weightInterfaces`
23+
5. **TypeScript enrichment** ([cypher/Typescript_Enrichment/](cypher/Typescript_Enrichment/)): module properties, namespace, `isNodeModule`, `IS_IMPLEMENTED_IN` resolution, npm linking
24+
6. **General enrichment** ([cypher/General_Enrichment/](cypher/General_Enrichment/)): file name/extension properties
25+
26+
### Domain Directory Structure
27+
28+
```
29+
domains/external-dependencies/
30+
├── README.md
31+
├── externalDependenciesCsv.sh # Entry point: CSV reports (*Csv.sh)
32+
├── externalDependenciesPython.sh # Entry point: Python charts (*Python.sh)
33+
├── externalDependenciesMarkdown.sh # Entry point: Markdown summary (*Markdown.sh)
34+
├── externalDependencyCharts.py # Chart generation: pie, bar, scatter → SVG
35+
├── explore/
36+
│ ├── ExternalDependenciesJava.ipynb # Original notebook (ValidateAlwaysFalse)
37+
│ └── ExternalDependenciesTypescript.ipynb # Original notebook (ValidateAlwaysFalse)
38+
├── queries/
39+
│ └── (all 35 .cypher files from cypher/External_Dependencies/)
40+
└── summary/
41+
├── externalDependenciesSummary.sh # Markdown assembly logic
42+
└── report.template.md # Main report template
43+
```
44+
45+
---
46+
47+
### Steps
48+
49+
#### Phase 1: Scaffolding & Cypher Queries
50+
51+
**1.1** Create domain directory structure with all subdirectories.
52+
53+
**1.2** Copy all 35 `.cypher` files from [cypher/External_Dependencies/](cypher/External_Dependencies/) into `domains/external-dependencies/queries/`.
54+
55+
**1.3** Create `README.md` — domain overview, prerequisites, entry points, folder structure (matching [anomaly-detection README](domains/anomaly-detection/README.md) format).
56+
57+
#### Phase 2: CSV Entry Point Script
58+
59+
**2.1** Create `externalDependenciesCsv.sh`:
60+
- Follow exact boilerplate pattern of [anomalyDetectionCsv.sh](domains/anomaly-detection/anomalyDetectionCsv.sh) (BASH_SOURCE/CDPATH, `set -o errexit -o pipefail`, script directory resolution)
61+
- Source `../../scripts/executeQueryFunctions.sh` for `execute_cypher()` and `execute_cypher_queries_until_results()`
62+
- Source `../../scripts/cleanupAfterReportGeneration.sh` for cleanup
63+
- Report directory: `reports/external-dependencies`
64+
- First check/create ExternalType labels, then execute all 24+ queries → CSV files
65+
- Replicate query ordering from [ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh)
66+
67+
#### Phase 3: Python Chart Generation Script (*parallel with Phase 4*)
68+
69+
**3.1** Create `externalDependencyCharts.py`:
70+
- Follow `Parameters` class pattern from [anomalyDetectionFeaturePlots.py](domains/anomaly-detection/anomalyDetectionFeaturePlots.py): `--report_directory`, `--verbose`, `--language` parameters
71+
- Neo4j connection using `neo4j` Python driver (same pattern as anomaly detection)
72+
- Load and execute Cypher `.cypher` files from `queries/` directory
73+
74+
**3.2** Data processing functions (extracted from notebooks):
75+
- `group_to_others_below_threshold(data_frame, value_column, name_column, threshold)` → DataFrame
76+
- `filter_values_below_threshold(data_frame, value_column, upper_limit)` → DataFrame
77+
- `explode_pie_slice(data_frame, index_value, base_value, emphasize_value)` → array
78+
- `plot_pie_chart(data_frame, title, file_path)` → saves SVG with percentage labels, legend, "others" explode
79+
- `plot_stacked_bar_chart(pivot_data, title, xlabel, ylabel, file_path)` → saves SVG
80+
- `plot_scatter_with_annotations(data_frame, x, y, size, color, title, file_path, annotations)` → saves SVG
81+
82+
**3.3** Java chart generation (**22 charts**):
83+
- 16 pie charts: top packages × {types, packages} × {overall, drill-down} × {full, second-level} + spread variants
84+
- 2 stacked bar charts: external packages per artifact (full + second-level)
85+
- 2 scatter plots: max/median internal package percentage vs external package count
86+
87+
**3.4** TypeScript chart generation (**~16 charts**):
88+
- 16 pie charts: modules × {elements, modules} × {overall, drill-down} + namespace variants + spread
89+
90+
**3.5** `main()` function: parse arguments, connect Neo4j, generate charts per language, handle "no data" gracefully.
91+
92+
#### Phase 4: Python Entry Point Script (*parallel with Phase 3*)
93+
94+
**4.1** Create `externalDependenciesPython.sh`:
95+
- Follow pattern of [anomalyDetectionPython.sh](domains/anomaly-detection/anomalyDetectionPython.sh)
96+
- Set script directory, report directory
97+
- Call `python externalDependencyCharts.py --language Java --report_directory ...`
98+
- Call again with `--language Typescript`
99+
100+
#### Phase 5: Markdown Summary
101+
102+
**5.1** Create `summary/report.template.md`:
103+
- YAML front matter (title, date, model version)
104+
- Section 1: Executive Overview — total external packages/modules, key frameworks identified
105+
- Section 2: Java External Dependencies — most used, spread analysis, per-artifact, aggregated (with `<!-- include:... -->` for tables and SVG references)
106+
- Section 3: TypeScript External Dependencies — modules, namespaces, per-module breakdown, package.json
107+
- Section 4: Architectural Recommendations — Hexagonal Architecture, Anti-Corruption Layer guidance
108+
- Section 5: Glossary & Column Definitions — all column descriptions from notebook markdown
109+
110+
**5.2** Create `summary/externalDependenciesSummary.sh`:
111+
- Follow [anomalyDetectionSummary.sh](domains/anomaly-detection/summary/anomalyDetectionSummary.sh) pattern
112+
- Read CSV files → generate markdown table snippet includes
113+
- Conditionally include SVG chart references if files exist
114+
- Generate front matter (title, date, git tag)
115+
- Assemble final `external_dependencies_report.md`
116+
117+
**5.3** Create `externalDependenciesMarkdown.sh`:
118+
- Thin delegator to `summary/externalDependenciesSummary.sh` (matching [anomalyDetectionMarkdown.sh](domains/anomaly-detection/anomalyDetectionMarkdown.sh) pattern)
119+
120+
#### Phase 6: Exploration Notebooks
121+
122+
**6.1** Copy [jupyter/ExternalDependenciesJava.ipynb](jupyter/ExternalDependenciesJava.ipynb)`explore/ExternalDependenciesJava.ipynb`, add `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"` to metadata (matching [AnomalyDetectionExploration.ipynb](domains/anomaly-detection/explore/AnomalyDetectionExploration.ipynb)).
123+
124+
**6.2** Copy [jupyter/ExternalDependenciesTypescript.ipynb](jupyter/ExternalDependenciesTypescript.ipynb)`explore/ExternalDependenciesTypescript.ipynb`, same metadata update.
125+
126+
---
127+
128+
### Relevant Files
129+
130+
**To create** (in `domains/external-dependencies/`):
131+
- `README.md`, 3 entry point `.sh`, `externalDependencyCharts.py`, `summary/externalDependenciesSummary.sh`, `summary/report.template.md`
132+
133+
**To copy** (35 `.cypher` + 2 `.ipynb`):
134+
- [cypher/External_Dependencies/](cypher/External_Dependencies/)`queries/`
135+
- [jupyter/ExternalDependenciesJava.ipynb](jupyter/ExternalDependenciesJava.ipynb), [jupyter/ExternalDependenciesTypescript.ipynb](jupyter/ExternalDependenciesTypescript.ipynb)`explore/`
136+
137+
**Reference (read-only)**:
138+
- [scripts/executeQueryFunctions.sh](scripts/executeQueryFunctions.sh)`execute_cypher()`, `extractQueryParameter()`
139+
- [scripts/cleanupAfterReportGeneration.sh](scripts/cleanupAfterReportGeneration.sh)
140+
- [scripts/reports/ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) — query ordering reference
141+
- [domains/anomaly-detection/anomalyDetectionFeaturePlots.py](domains/anomaly-detection/anomalyDetectionFeaturePlots.py)`Parameters`, `get_file_path()`, Neo4j query pattern
142+
- [domains/anomaly-detection/anomalyDetectionCsv.sh](domains/anomaly-detection/anomalyDetectionCsv.sh) — shell boilerplate
143+
- [domains/anomaly-detection/summary/anomalyDetectionSummary.sh](domains/anomaly-detection/summary/anomalyDetectionSummary.sh) — template assembly, `<!-- include: -->` pattern
144+
145+
### Verification
146+
147+
1. **Structure**: Domain contains 3 `.sh` entry points, 1 `.py`, ~35 `.cypher` in queries/, 2 `.ipynb` in explore/, summary/ with `.sh` and `.md`
148+
2. **Shell lint**: `shellcheck domains/external-dependencies/*.sh domains/external-dependencies/summary/*.sh`
149+
3. **Python lint**: `python -m py_compile domains/external-dependencies/externalDependencyCharts.py`
150+
4. **Pipeline discovery**: `find domains/ -name "*Csv.sh"`, `*Python.sh`, `*Markdown.sh` all return the new domain's scripts
151+
5. **Notebook metadata**: Both explore/ notebooks contain `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"`
152+
6. **Cypher count**: `ls domains/external-dependencies/queries/*.cypher | wc -l` matches original count
153+
7. **No external changes**: No new files outside `domains/external-dependencies/`
154+
8. **README completeness**: Documents prerequisites, entry points, folder structure
155+
156+
### Scope Boundaries
157+
158+
**Included**: All 35 cypher queries, ~38 SVG chart conversions, CSV entry point, Markdown summary, exploration notebooks, prerequisites documentation, package.json queries
159+
160+
**Excluded**: Graph visualizations (GraphViz), moving/deleting originals, reset queries, changes to central pipeline scripts, validation cypher query

0 commit comments

Comments
 (0)