|
| 1 | +## Plan: Create External Dependencies Domain |
| 2 | + |
| 3 | +Create a new self-contained `domains/external-dependencies/` domain following the `anomaly-detection` reference pattern. Copy all 35 Cypher queries, the existing CSV shell script, convert both Jupyter notebooks (Java + TypeScript) to a Python SVG chart generator, move the notebooks to an `explore/` folder with validation disabled, and assemble a Markdown summary report optimized for human and AI agent consumption. No moves or deletions of originals. No graph visualizations yet. |
| 4 | + |
| 5 | +### Decisions |
| 6 | + |
| 7 | +- **ExternalDependenciesCsv.sh**: Keep both (original in [scripts/reports/ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) unchanged, copy in domain) |
| 8 | +- **Cypher files**: Copy all 35 into domain for self-containment |
| 9 | +- **Charts**: ALL charts from both notebooks converted to Python SVG output (~38 total) |
| 10 | +- **Package.json queries**: Included in domain |
| 11 | +- **Reset folder**: Not included (external type labels are a central enrichment prerequisite) |
| 12 | +- **Markdown summary**: Rich, AI-agent-actionable descriptions with architectural guidance |
| 13 | +- **Entry point naming**: camelCase prefix (`externalDependenciesCsv.sh`, etc.) |
| 14 | + |
| 15 | +### Prerequisites (Documented in README, Not Copied) |
| 16 | + |
| 17 | +The following are provided by the central pipeline and must run *before* this domain: |
| 18 | + |
| 19 | +1. **Neo4j running** with scanned artifacts loaded |
| 20 | +2. **DEPENDS_ON relationships** between types (jQAssistant scan) |
| 21 | +3. **Type labels** ([cypher/Types/](cypher/Types/)): base Java types, built-in types, resolved duplicates |
| 22 | +4. **Weight properties** on DEPENDS_ON ([cypher/DependsOn_Relationship_Weights/](cypher/DependsOn_Relationship_Weights/)): `weight`, `weightInterfaces` |
| 23 | +5. **TypeScript enrichment** ([cypher/Typescript_Enrichment/](cypher/Typescript_Enrichment/)): module properties, namespace, `isNodeModule`, `IS_IMPLEMENTED_IN` resolution, npm linking |
| 24 | +6. **General enrichment** ([cypher/General_Enrichment/](cypher/General_Enrichment/)): file name/extension properties |
| 25 | + |
| 26 | +### Domain Directory Structure |
| 27 | + |
| 28 | +``` |
| 29 | +domains/external-dependencies/ |
| 30 | +├── README.md |
| 31 | +├── externalDependenciesCsv.sh # Entry point: CSV reports (*Csv.sh) |
| 32 | +├── externalDependenciesPython.sh # Entry point: Python charts (*Python.sh) |
| 33 | +├── externalDependenciesMarkdown.sh # Entry point: Markdown summary (*Markdown.sh) |
| 34 | +├── externalDependencyCharts.py # Chart generation: pie, bar, scatter → SVG |
| 35 | +├── explore/ |
| 36 | +│ ├── ExternalDependenciesJava.ipynb # Original notebook (ValidateAlwaysFalse) |
| 37 | +│ └── ExternalDependenciesTypescript.ipynb # Original notebook (ValidateAlwaysFalse) |
| 38 | +├── queries/ |
| 39 | +│ └── (all 35 .cypher files from cypher/External_Dependencies/) |
| 40 | +└── summary/ |
| 41 | + ├── externalDependenciesSummary.sh # Markdown assembly logic |
| 42 | + └── report.template.md # Main report template |
| 43 | +``` |
| 44 | + |
| 45 | +--- |
| 46 | + |
| 47 | +### Steps |
| 48 | + |
| 49 | +#### Phase 1: Scaffolding & Cypher Queries |
| 50 | + |
| 51 | +**1.1** Create domain directory structure with all subdirectories. |
| 52 | + |
| 53 | +**1.2** Copy all 35 `.cypher` files from [cypher/External_Dependencies/](cypher/External_Dependencies/) into `domains/external-dependencies/queries/`. |
| 54 | + |
| 55 | +**1.3** Create `README.md` — domain overview, prerequisites, entry points, folder structure (matching [anomaly-detection README](domains/anomaly-detection/README.md) format). |
| 56 | + |
| 57 | +#### Phase 2: CSV Entry Point Script |
| 58 | + |
| 59 | +**2.1** Create `externalDependenciesCsv.sh`: |
| 60 | +- Follow exact boilerplate pattern of [anomalyDetectionCsv.sh](domains/anomaly-detection/anomalyDetectionCsv.sh) (BASH_SOURCE/CDPATH, `set -o errexit -o pipefail`, script directory resolution) |
| 61 | +- Source `../../scripts/executeQueryFunctions.sh` for `execute_cypher()` and `execute_cypher_queries_until_results()` |
| 62 | +- Source `../../scripts/cleanupAfterReportGeneration.sh` for cleanup |
| 63 | +- Report directory: `reports/external-dependencies` |
| 64 | +- First check/create ExternalType labels, then execute all 24+ queries → CSV files |
| 65 | +- Replicate query ordering from [ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) |
| 66 | + |
| 67 | +#### Phase 3: Python Chart Generation Script (*parallel with Phase 4*) |
| 68 | + |
| 69 | +**3.1** Create `externalDependencyCharts.py`: |
| 70 | +- Follow `Parameters` class pattern from [anomalyDetectionFeaturePlots.py](domains/anomaly-detection/anomalyDetectionFeaturePlots.py): `--report_directory`, `--verbose`, `--language` parameters |
| 71 | +- Neo4j connection using `neo4j` Python driver (same pattern as anomaly detection) |
| 72 | +- Load and execute Cypher `.cypher` files from `queries/` directory |
| 73 | + |
| 74 | +**3.2** Data processing functions (extracted from notebooks): |
| 75 | +- `group_to_others_below_threshold(data_frame, value_column, name_column, threshold)` → DataFrame |
| 76 | +- `filter_values_below_threshold(data_frame, value_column, upper_limit)` → DataFrame |
| 77 | +- `explode_pie_slice(data_frame, index_value, base_value, emphasize_value)` → array |
| 78 | +- `plot_pie_chart(data_frame, title, file_path)` → saves SVG with percentage labels, legend, "others" explode |
| 79 | +- `plot_stacked_bar_chart(pivot_data, title, xlabel, ylabel, file_path)` → saves SVG |
| 80 | +- `plot_scatter_with_annotations(data_frame, x, y, size, color, title, file_path, annotations)` → saves SVG |
| 81 | + |
| 82 | +**3.3** Java chart generation (**22 charts**): |
| 83 | +- 16 pie charts: top packages × {types, packages} × {overall, drill-down} × {full, second-level} + spread variants |
| 84 | +- 2 stacked bar charts: external packages per artifact (full + second-level) |
| 85 | +- 2 scatter plots: max/median internal package percentage vs external package count |
| 86 | + |
| 87 | +**3.4** TypeScript chart generation (**~16 charts**): |
| 88 | +- 16 pie charts: modules × {elements, modules} × {overall, drill-down} + namespace variants + spread |
| 89 | + |
| 90 | +**3.5** `main()` function: parse arguments, connect Neo4j, generate charts per language, handle "no data" gracefully. |
| 91 | + |
| 92 | +#### Phase 4: Python Entry Point Script (*parallel with Phase 3*) |
| 93 | + |
| 94 | +**4.1** Create `externalDependenciesPython.sh`: |
| 95 | +- Follow pattern of [anomalyDetectionPython.sh](domains/anomaly-detection/anomalyDetectionPython.sh) |
| 96 | +- Set script directory, report directory |
| 97 | +- Call `python externalDependencyCharts.py --language Java --report_directory ...` |
| 98 | +- Call again with `--language Typescript` |
| 99 | + |
| 100 | +#### Phase 5: Markdown Summary |
| 101 | + |
| 102 | +**5.1** Create `summary/report.template.md`: |
| 103 | +- YAML front matter (title, date, model version) |
| 104 | +- Section 1: Executive Overview — total external packages/modules, key frameworks identified |
| 105 | +- Section 2: Java External Dependencies — most used, spread analysis, per-artifact, aggregated (with `<!-- include:... -->` for tables and SVG references) |
| 106 | +- Section 3: TypeScript External Dependencies — modules, namespaces, per-module breakdown, package.json |
| 107 | +- Section 4: Architectural Recommendations — Hexagonal Architecture, Anti-Corruption Layer guidance |
| 108 | +- Section 5: Glossary & Column Definitions — all column descriptions from notebook markdown |
| 109 | + |
| 110 | +**5.2** Create `summary/externalDependenciesSummary.sh`: |
| 111 | +- Follow [anomalyDetectionSummary.sh](domains/anomaly-detection/summary/anomalyDetectionSummary.sh) pattern |
| 112 | +- Read CSV files → generate markdown table snippet includes |
| 113 | +- Conditionally include SVG chart references if files exist |
| 114 | +- Generate front matter (title, date, git tag) |
| 115 | +- Assemble final `external_dependencies_report.md` |
| 116 | + |
| 117 | +**5.3** Create `externalDependenciesMarkdown.sh`: |
| 118 | +- Thin delegator to `summary/externalDependenciesSummary.sh` (matching [anomalyDetectionMarkdown.sh](domains/anomaly-detection/anomalyDetectionMarkdown.sh) pattern) |
| 119 | + |
| 120 | +#### Phase 6: Exploration Notebooks |
| 121 | + |
| 122 | +**6.1** Copy [jupyter/ExternalDependenciesJava.ipynb](jupyter/ExternalDependenciesJava.ipynb) → `explore/ExternalDependenciesJava.ipynb`, add `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"` to metadata (matching [AnomalyDetectionExploration.ipynb](domains/anomaly-detection/explore/AnomalyDetectionExploration.ipynb)). |
| 123 | + |
| 124 | +**6.2** Copy [jupyter/ExternalDependenciesTypescript.ipynb](jupyter/ExternalDependenciesTypescript.ipynb) → `explore/ExternalDependenciesTypescript.ipynb`, same metadata update. |
| 125 | + |
| 126 | +--- |
| 127 | + |
| 128 | +### Relevant Files |
| 129 | + |
| 130 | +**To create** (in `domains/external-dependencies/`): |
| 131 | +- `README.md`, 3 entry point `.sh`, `externalDependencyCharts.py`, `summary/externalDependenciesSummary.sh`, `summary/report.template.md` |
| 132 | + |
| 133 | +**To copy** (35 `.cypher` + 2 `.ipynb`): |
| 134 | +- [cypher/External_Dependencies/](cypher/External_Dependencies/) → `queries/` |
| 135 | +- [jupyter/ExternalDependenciesJava.ipynb](jupyter/ExternalDependenciesJava.ipynb), [jupyter/ExternalDependenciesTypescript.ipynb](jupyter/ExternalDependenciesTypescript.ipynb) → `explore/` |
| 136 | + |
| 137 | +**Reference (read-only)**: |
| 138 | +- [scripts/executeQueryFunctions.sh](scripts/executeQueryFunctions.sh) — `execute_cypher()`, `extractQueryParameter()` |
| 139 | +- [scripts/cleanupAfterReportGeneration.sh](scripts/cleanupAfterReportGeneration.sh) |
| 140 | +- [scripts/reports/ExternalDependenciesCsv.sh](scripts/reports/ExternalDependenciesCsv.sh) — query ordering reference |
| 141 | +- [domains/anomaly-detection/anomalyDetectionFeaturePlots.py](domains/anomaly-detection/anomalyDetectionFeaturePlots.py) — `Parameters`, `get_file_path()`, Neo4j query pattern |
| 142 | +- [domains/anomaly-detection/anomalyDetectionCsv.sh](domains/anomaly-detection/anomalyDetectionCsv.sh) — shell boilerplate |
| 143 | +- [domains/anomaly-detection/summary/anomalyDetectionSummary.sh](domains/anomaly-detection/summary/anomalyDetectionSummary.sh) — template assembly, `<!-- include: -->` pattern |
| 144 | + |
| 145 | +### Verification |
| 146 | + |
| 147 | +1. **Structure**: Domain contains 3 `.sh` entry points, 1 `.py`, ~35 `.cypher` in queries/, 2 `.ipynb` in explore/, summary/ with `.sh` and `.md` |
| 148 | +2. **Shell lint**: `shellcheck domains/external-dependencies/*.sh domains/external-dependencies/summary/*.sh` |
| 149 | +3. **Python lint**: `python -m py_compile domains/external-dependencies/externalDependencyCharts.py` |
| 150 | +4. **Pipeline discovery**: `find domains/ -name "*Csv.sh"`, `*Python.sh`, `*Markdown.sh` all return the new domain's scripts |
| 151 | +5. **Notebook metadata**: Both explore/ notebooks contain `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"` |
| 152 | +6. **Cypher count**: `ls domains/external-dependencies/queries/*.cypher | wc -l` matches original count |
| 153 | +7. **No external changes**: No new files outside `domains/external-dependencies/` |
| 154 | +8. **README completeness**: Documents prerequisites, entry points, folder structure |
| 155 | + |
| 156 | +### Scope Boundaries |
| 157 | + |
| 158 | +**Included**: All 35 cypher queries, ~38 SVG chart conversions, CSV entry point, Markdown summary, exploration notebooks, prerequisites documentation, package.json queries |
| 159 | + |
| 160 | +**Excluded**: Graph visualizations (GraphViz), moving/deleting originals, reset queries, changes to central pipeline scripts, validation cypher query |
0 commit comments