Skip to content

Commit d11ea36

Browse files
authored
Merge pull request #566 from JohT/feature/extend-internal-dependencies-domain
Extend internal-dependencies domain
2 parents 2bef838 + 2c7ba64 commit d11ea36

49 files changed

Lines changed: 5334 additions & 2 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
# Plan: Internal Dependencies Domain — Additional Reports
2+
3+
## Domain Fit Assessment
4+
5+
| Addition | Fits? | Rationale |
6+
|---|---|---|
7+
| DependenciesGraphExploration (Java/TS) | ✅ YES | Visualizes internal dependency hierarchy — core domain content |
8+
| OOP Design Metrics | ✅ YES | Instability + Abstractness measure coupling quality — dependency design metrics defined in terms of dependency ratios |
9+
| Visibility Metrics | ✅ YES | Public API surface encapsulation shapes how modules/packages can depend on each other |
10+
| Wordcloud (code names) | ⚠️ BORDERLINE | General vocabulary analysis, not dependency data; no better domain exists currently |
11+
12+
---
13+
14+
## Extends: plan-internal_dependencies_domain.prompt.md
15+
16+
All steps below are ADDITIONS to that plan. Phase numbering continues from original.
17+
18+
---
19+
20+
## New Files (delta from original plan)
21+
22+
**New Python scripts** (in `domains/internal-dependencies/`):
23+
- `objectOrientedDesignMetricsCharts.py` — scatter + bar charts for instability/abstractness/main sequence
24+
- `visibilityMetricsCharts.py` — scatter subplots for visibility percentiles
25+
- `wordcloudChart.py` — code unit names wordcloud as SVG
26+
27+
**New query directories**:
28+
- `queries/ood-metrics/` — 29 files from `cypher/Metrics/`
29+
- `queries/visibility/` — 4 files from `cypher/Visibility/`
30+
- Add to `queries/exploration/``Words_for_universal_Wordcloud.cypher`
31+
32+
**New explore notebooks** (7 additional, copies of jupyter/):
33+
- `explore/DependenciesGraphExplorationJava.ipynb`
34+
- `explore/DependenciesGraphExplorationTypescript.ipynb`
35+
- `explore/ObjectOrientedDesignMetricsJava.ipynb`
36+
- `explore/ObjectOrientedDesignMetricsTypescript.ipynb`
37+
- `explore/VisibilityMetricsJava.ipynb`
38+
- `explore/VisibilityMetricsTypescript.ipynb`
39+
- `explore/Wordcloud.ipynb`
40+
41+
**Updated files**:
42+
- `internalDependenciesCsv.sh` — add OOD metrics + visibility metrics sections
43+
- `internalDependenciesPython.sh` — call 3 new chart scripts
44+
- `summary/report.template.md` — add OOD metrics, visibility, wordcloud sections
45+
- `COPIED_FILES.md` — add all new original→copy mappings
46+
47+
---
48+
49+
## Steps
50+
51+
### Phase 1 Extension: Cypher Queries
52+
53+
**1.10** Copy 29 files from `cypher/Metrics/` into `queries/ood-metrics/`:
54+
- Java (without subpackages): `Get_Incoming_Java_Package_Dependencies.cypher`, `Set_Incoming_Java_Package_Dependencies.cypher`, `Get_Outgoing_Java_Package_Dependencies.cypher`, `Set_Outgoing_Java_Package_Dependencies.cypher`, `Get_Instability_for_Java.cypher`, `Calculate_and_set_Instability_for_Java.cypher`, `Get_Abstractness_for_Java.cypher`, `Calculate_and_set_Abstractness_for_Java.cypher`, `Calculate_distance_between_abstractness_and_instability_for_Java.cypher`
55+
- Java (including subpackages): same 9 files with `_Including_Subpackages` / `_including_subpackages` suffix variants
56+
- TypeScript: `Get_Incoming_Typescript_Module_Dependencies.cypher`, `Set_Incoming_Typescript_Module_Dependencies.cypher`, `Get_Outgoing_Typescript_Module_Dependencies.cypher`, `Set_Outgoing_Typescript_Module_Dependencies.cypher`, `Get_Instability_for_Typescript.cypher`, `Calculate_and_set_Instability_for_Typescript.cypher`, `Get_Abstractness_for_Typescript.cypher`, `Calculate_and_set_Abstractness_for_Typescript.cypher`, `Calculate_distance_between_abstractness_and_instability_for_Typescript.cypher`
57+
- Shared prerequisite: `Count_and_set_abstract_types.cypher` (required before abstractness calculation)
58+
59+
Exact file count: verify with `ls cypher/Metrics/ | wc -l` to ensure no files are missed; the CSV script uses a specific subset, notebooks use more — copy all needed by either.
60+
61+
**1.11** Copy all 4 files from `cypher/Visibility/` into `queries/visibility/`:
62+
- `Global_relative_visibility_statistics_for_types.cypher`
63+
- `Relative_visibility_public_types_to_all_types_per_package.cypher`
64+
- `Global_relative_visibility_statistics_for_elements_for_Typescript.cypher`
65+
- `Relative_visibility_exported_elements_to_all_elements_per_module_for_Typescript.cypher`
66+
67+
**1.12** Copy `cypher/Overview/Words_for_universal_Wordcloud.cypher``queries/exploration/` (explore notebook reference + wordcloud chart).
68+
69+
### Phase 2 Extension: CSV Entry Point
70+
71+
**2.x** Extend `internalDependenciesCsv.sh` — add after existing topological sort section:
72+
73+
**OOP Metrics block** (follow `scripts/reports/ObjectOrientedDesignMetricsCsv.sh` ordering):
74+
- Java without subpackages (5 queries): incoming, outgoing, instability, abstractness, main-sequence distance → `Java_Package/`
75+
- Java with subpackages (5 queries): same set with `_Including_Subpackages` suffix → `Java_Package/`
76+
- TypeScript (5 queries): TypeScript equivalents → `Typescript_Module/`
77+
- Note: `Count_and_set_abstract_types.cypher` must run before abstractness queries; check if `ObjectOrientedDesignMetricsCsv.sh` runs it explicitly and replicate that order.
78+
79+
**Visibility Metrics block** (follow `scripts/reports/VisibilityMetricsCsv.sh` ordering):
80+
- Java: global visibility stats per artifact → `Java_Artifact/`, per-package visibility → `Java_Package/`
81+
- TypeScript: global stats per project → `Typescript_Module/`, per-module visibility → `Typescript_Module/`
82+
83+
### Phase 3 Extension: Python Chart Scripts (*parallel with Phase 4 extension*)
84+
85+
**3.6** Create `objectOrientedDesignMetricsCharts.py`:
86+
- `Parameters` class: `--report_directory`, `--queries_directory` (default: `queries/ood-metrics/`), `--verbose`
87+
- Run `Count_and_set_abstract_types.cypher` first (prerequisite for abstractness)
88+
- Run `Calculate_and_set_*` queries (idempotent write-back to graph), then read results
89+
- Chart functions (all save SVG):
90+
- `plot_top_dependencies_bar(data, title, x_col, y_col, file_path)` — horizontal bar, top 30 packages by incoming/outgoing deps
91+
- `plot_main_sequence_scatter(data, title, file_path)` — scatter: X=abstractness, Y=instability; point size=type count; color=distance from main sequence (green=near, red=far); green dashed diagonal reference line
92+
- Sections: Java packages (without subpackages), Java packages (including subpackages), TypeScript modules
93+
- Output to: `Java_Package/` and `Typescript_Module/` subdirs within report directory
94+
- Handle "no data" gracefully with warning + skip
95+
96+
**3.7** Create `visibilityMetricsCharts.py`:
97+
- `Parameters` class: `--report_directory`, `--queries_directory` (default: `queries/visibility/`), `--verbose`
98+
- Chart functions:
99+
- `plot_visibility_scatter(data, title, file_path, percentile_col, y_col)` — scatter: X=visibility percentile (25/50/75), Y=package/module count (log scale); custom Y ticks: 1, 2, 5, 10, 20, 50, 100, 200, 500, 1K, 2K, 5K, 10K
100+
- `plot_visibility_subplots(java_data, ts_data, report_dir)` — 3-subplot layout matching notebook style
101+
- Output to: `Java_Package/` for Java, `Typescript_Module/` for TypeScript
102+
103+
**3.8** Create `wordcloudChart.py`:
104+
- `Parameters` class: `--report_directory`, `--queries_directory` (default: `queries/exploration/`), `--verbose`
105+
- Run `Words_for_universal_Wordcloud.cypher` to get word list
106+
- Apply same stopwords list as Wordcloud.ipynb (builder, exception, abstract, helper, util, callback, factory, result, handler, test, impl, plugin, etc.)
107+
- Generate wordcloud using `WordCloud.to_svg()` for pure-vector SVG output (800x800, max 600 words, viridis colormap)
108+
- Output: `reports/internal-dependencies/CodeNamesWordcloud.svg`
109+
- Handle "no data" (no nodes with names) with warning + skip
110+
111+
### Phase 4 Extension: Python Entry Point
112+
113+
**4.1 Update** `internalDependenciesPython.sh` — after existing `pathFindingCharts.py` call, add:
114+
```
115+
python objectOrientedDesignMetricsCharts.py --report_directory "${FULL_REPORT_DIRECTORY}" ${verboseMode}
116+
python visibilityMetricsCharts.py --report_directory "${FULL_REPORT_DIRECTORY}" ${verboseMode}
117+
python wordcloudChart.py --report_directory "${FULL_REPORT_DIRECTORY}" ${verboseMode}
118+
```
119+
Follow same pattern as existing call in `externalDependenciesPython.sh`.
120+
121+
### Phase 6 Extension: Markdown Summary
122+
123+
**6.1 Update** `summary/report.template.md` — add new sections after existing content:
124+
125+
**New Section: OOP Design Metrics**
126+
- Introductory paragraph: Condense from notebook — *"Based on Robert C. Martin's stable dependencies principle. **Instability** = outgoing/(incoming+outgoing): 0 = fully stable (many dependents, hard to change), 1 = fully unstable (no dependents, easy to change). **Abstractness** = abstract types / total types: 0 = fully concrete, 1 = fully abstract. The **Main Sequence** diagonal (A + I = 1) defines the ideal balance. Distance from main sequence measures how far a package deviates from this ideal: near 0 = well-balanced, near 1 = problematic ('Zone of Pain' = concrete+stable; 'Zone of Uselessness' = abstract+unstable)."*
127+
- Java packages (without subpackages): table links + scatter chart references with 1-3 sentence descriptions
128+
- Java packages (including subpackages): same
129+
- TypeScript modules: same
130+
- Glossary additions: `Instability`, `Abstractness`, `Distance from Main Sequence`, `Zone of Pain`, `Zone of Uselessness`
131+
132+
**New Section: Visibility Metrics**
133+
- Introductory paragraph: *"Measures the ratio of publicly visible types/elements to all types/elements per package or module. High visibility means most internals are exposed (low encapsulation). The percentile25/50/75 metrics per artifact show whether low-visibility packages are the norm or outliers within each artifact."*
134+
- Java scatter subplot references with chart description
135+
- TypeScript scatter subplot references
136+
- Linked tables: top 40 packages/modules with lowest encapsulation
137+
138+
**New Section: Code Vocabulary (Wordcloud)**
139+
- Introductory paragraph: *"Words derived from code element names across all artifacts/modules (types, methods, variables). Constructed by splitting camelCase/snake_case identifiers, filtering common stopwords (util, helper, factory, etc.), and weighting by frequency. Larger words appear more often in the codebase — revealing dominant concerns and naming patterns."*
140+
- Wordcloud SVG reference (conditional include if file exists)
141+
142+
**6.2 Update** `summary/internalDependenciesSummary.sh` — add:
143+
- Execute OOD metrics read queries → Markdown table includes (instability/abstractness/distance tables for Java + TypeScript)
144+
- Execute visibility read queries → Markdown table includes (low-visibility packages/modules tables)
145+
- Conditional SVG chart references (OOD scatter plots, visibility subplots, wordcloud)
146+
147+
### Phase 7 Extension: Exploration Notebooks (7 additional)
148+
149+
**7.5** Copy `jupyter/DependenciesGraphExplorationJava.ipynb``explore/DependenciesGraphExplorationJava.ipynb`
150+
- Already has `ValidateAlwaysFalse` — no metadata change needed
151+
152+
**7.6** Copy `jupyter/DependenciesGraphExplorationTypescript.ipynb``explore/DependenciesGraphExplorationTypescript.ipynb`
153+
- Already has `ValidateAlwaysFalse` — no metadata change needed
154+
155+
**7.7** Copy `jupyter/ObjectOrientedDesignMetricsJava.ipynb``explore/ObjectOrientedDesignMetricsJava.ipynb`
156+
- Change `"code_graph_analysis_pipeline_data_validation"` from `"ValidateJavaPackageDependencies"``"ValidateAlwaysFalse"`
157+
158+
**7.8** Copy `jupyter/ObjectOrientedDesignMetricsTypescript.ipynb``explore/ObjectOrientedDesignMetricsTypescript.ipynb`
159+
- Change `"ValidateTypescriptModuleDependencies"``"ValidateAlwaysFalse"`
160+
161+
**7.9** Copy `jupyter/VisibilityMetricsJava.ipynb``explore/VisibilityMetricsJava.ipynb`
162+
- Change `"ValidateJavaTypes"``"ValidateAlwaysFalse"`
163+
164+
**7.10** Copy `jupyter/VisibilityMetricsTypescript.ipynb``explore/VisibilityMetricsTypescript.ipynb`
165+
- Change `"ValidateTypescriptModuleDependencies"``"ValidateAlwaysFalse"`
166+
167+
**7.11** Copy `jupyter/Wordcloud.ipynb``explore/Wordcloud.ipynb`
168+
- Change data validation to `"ValidateAlwaysFalse"`
169+
- Note in COPIED_FILES.md: only "Wordcloud of names in code" section (`Words_for_universal_Wordcloud.cypher`) is replicated in `wordcloudChart.py`; the git authors wordcloud section is explore-only
170+
171+
---
172+
173+
## Relevant Files (delta)
174+
175+
**To create**:
176+
- `domains/internal-dependencies/objectOrientedDesignMetricsCharts.py`
177+
- `domains/internal-dependencies/visibilityMetricsCharts.py`
178+
- `domains/internal-dependencies/wordcloudChart.py`
179+
180+
**To copy** (new Cypher, ~34 files):
181+
- `cypher/Metrics/``queries/ood-metrics/` (29 files: 9 Java + 9 Java-subpackages + 9 TypeScript + `Count_and_set_abstract_types.cypher` + 1 TBD from `ObjectOrientedDesignMetricsCsv.sh` reference)
182+
- `cypher/Visibility/` (all 4 files) → `queries/visibility/`
183+
- `cypher/Overview/Words_for_universal_Wordcloud.cypher``queries/exploration/`
184+
185+
**To copy** (new notebooks, 7 files):
186+
- `jupyter/DependenciesGraphExplorationJava.ipynb``explore/`
187+
- `jupyter/DependenciesGraphExplorationTypescript.ipynb``explore/`
188+
- `jupyter/ObjectOrientedDesignMetricsJava.ipynb``explore/`
189+
- `jupyter/ObjectOrientedDesignMetricsTypescript.ipynb``explore/`
190+
- `jupyter/VisibilityMetricsJava.ipynb``explore/`
191+
- `jupyter/VisibilityMetricsTypescript.ipynb``explore/`
192+
- `jupyter/Wordcloud.ipynb``explore/`
193+
194+
**To modify**:
195+
- `domains/internal-dependencies/internalDependenciesCsv.sh` — add OOD metrics + visibility sections
196+
- `domains/internal-dependencies/internalDependenciesPython.sh` — add 3 new chart script calls
197+
- `domains/internal-dependencies/summary/report.template.md` — add 3 new sections
198+
- `domains/internal-dependencies/summary/internalDependenciesSummary.sh` — add table/chart includes
199+
- `domains/internal-dependencies/COPIED_FILES.md` — add new mappings
200+
201+
**Reference** (read-only, for Python chart implementation):
202+
- `scripts/reports/ObjectOrientedDesignMetricsCsv.sh` — query ordering + output file names
203+
- `scripts/reports/VisibilityMetricsCsv.sh` — query ordering + output file names
204+
- `jupyter/ObjectOrientedDesignMetricsJava.ipynb` — scatter plot implementation details
205+
- `jupyter/VisibilityMetricsJava.ipynb` — 3-subplot scatter implementation + Y-axis tick list
206+
- `jupyter/Wordcloud.ipynb` — stopwords list + wordcloud parameters (800x800, max 600 words, viridis)
207+
- `domains/external-dependencies/externalDependencyCharts.py` — Parameters class pattern
208+
209+
---
210+
211+
## Verification (delta)
212+
213+
1. **Cypher count**: `find domains/internal-dependencies/queries/ -name "*.cypher" | wc -l` = 43 (original) + ~34 (new) ≈ 77
214+
2. **Python compile**: `python -m py_compile domains/internal-dependencies/objectOrientedDesignMetricsCharts.py visibilityMetricsCharts.py wordcloudChart.py`
215+
3. **Shell lint**: `shellcheck` on updated `internalDependenciesCsv.sh` and `internalDependenciesPython.sh`
216+
4. **Notebook metadata**: All 11 explore/ notebooks contain `"code_graph_analysis_pipeline_data_validation": "ValidateAlwaysFalse"`
217+
5. **No external changes**: No modifications outside `domains/internal-dependencies/`
218+
219+
---
220+
221+
## Further Considerations
222+
223+
1. **Wordcloud domain fit**: Code vocabulary analysis is not directly a dependency metric. If an Overview domain is planned later, `wordcloudChart.py` and `explore/Wordcloud.ipynb` could move there. For now, including it is the pragmatic choice.
224+
2. **Count_and_set_abstract_types prerequisite**: Verify whether `ObjectOrientedDesignMetricsCsv.sh` runs `Count_and_set_abstract_types.cypher` explicitly before abstractness queries — replicate that order in the domain CSV script. If it doesn't (i.e., it's expected to be a prior pipeline step), document it as a prerequisite in PREREQUISITES.md instead of running it inline.
225+
3. **Wordcloud SVG method**: `WordCloud.to_svg()` produces pure-vector SVG. If the installed `wordcloud` library version doesn't support it, fall back to rendering to a matplotlib figure and saving as SVG (rasterized but valid).

0 commit comments

Comments
 (0)