Skip to content

Improve plots and fix exploratory notebook issues#596

Merged
JohT merged 8 commits into
mainfrom
feature/improve-plots
May 17, 2026
Merged

Improve plots and fix exploratory notebook issues#596
JohT merged 8 commits into
mainfrom
feature/improve-plots

Conversation

@JohT JohT changed the title Feature/improve plots Improve plots and fix exploratory notebook issues May 16, 2026
@JohT JohT self-assigned this May 16, 2026
@JohT JohT force-pushed the feature/improve-plots branch from e8d33a3 to 6705027 Compare May 16, 2026 18:41
@JohT JohT requested a review from Copilot May 16, 2026 18:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes a set of fixes and improvements to exploratory Jupyter notebooks and the related chart-generation Python scripts. It adds nbformat so Plotly can render figures in notebooks, repoints notebooks at their new queries/ folders (and ../../../cypher/... for shared Cypher), adds a new cyclomatic-complexity distribution chart + CSV report for the Java domain, drops the no-longer-needed openTSNE dimensionality reduction in favour of UMAP, and introduces a CI workflow that smoke-executes every explore/*.ipynb to catch import/syntax regressions.

Changes:

  • Notebook hygiene: type annotations, path corrections, removal of dead helpers (get_plotly_figure_write_image_settings, t-SNE path), stringification of date columns for Plotly JSON serialization, NEO4J_INITIAL_PASSWORD validation.
  • New chart pipeline: Cyclomatic_Method_Complexity_Distribution.cypher + CSV export + normalized per-artifact line chart in javaCharts.py; pie charts in externalDependencyCharts.py now plot the full dataset and rely on threshold filtering instead of pre-head(20).
  • New internal-check-notebooks.yml workflow that executes notebooks with nbconvert --allow-errors and fails only on ModuleNotFoundError/ImportError/SyntaxError.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pyproject.toml Add nbformat==5.10.4 (Plotly notebook rendering).
conda-environment.yml Mirror the new nbformat=5.10.4 pin.
uv.lock Lockfile updates for nbformat and transitive deps.
domains/java/queries/method-metrics/Cyclomatic_Method_Complexity_Distribution.cypher New query (description has a duplicated word; uses lowercase asc).
domains/java/javaCsv.sh Export the new cyclomatic-complexity CSV.
domains/java/javaCharts.py Refactor line-count chart to normalized line chart; add cyclomatic chart.
domains/java/explore/MethodMetricsJavaExploration.ipynb Notebook fixes & path update for the moved cypher file.
domains/external-dependencies/externalDependencyCharts.py Drop unnecessary head(20) calls; widen < to <= in drill-down filter (inconsistent with grouping).
domains/external-dependencies/explore/ExternalDependenciesJava.ipynb Type annotations, password validation, path updates, bug-fix variable rename in drill-down cell.
domains/external-dependencies/explore/ExternalDependenciesTypescript.ipynb Same modernization & path updates as the Java notebook.
domains/git-history/explore/GitHistoryGeneralExploration.ipynb Cast Plotly date columns to strings; remove SVG write helper; metadata Python version downgraded (3.12.8) and diverges from conda env.
domains/node-embeddings/explore/NodeEmbeddingsJavaExploration.ipynb Path updates and type-ignore pragmas.
domains/node-embeddings/explore/NodeEmbeddingsTypescriptExploration.ipynb Same path/pragma updates.
domains/anomaly-detection/explore/NodeEmbeddingsHyperparameterTuningExploration.ipynb Replace t-SNE with UMAP, add copy=True to HDBSCAN, cypher path updates; stray execution_count: 43.
.github/workflows/internal-check-notebooks.yml New matrix workflow that smoke-tests all explore notebooks.

Comment thread domains/external-dependencies/externalDependencyCharts.py Outdated
Comment thread domains/java/javaCharts.py Outdated
Comment thread domains/git-history/explore/GitHistoryGeneralExploration.ipynb Outdated
Comment thread .github/workflows/internal-check-notebooks.yml
Comment thread .github/workflows/internal-check-notebooks.yml Outdated
Comment thread domains/java/javaCharts.py
@JohT JohT force-pushed the feature/improve-plots branch from 6705027 to ddf3dea Compare May 16, 2026 18:54
@JohT JohT requested a review from Copilot May 16, 2026 19:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Comment thread domains/java/javaCharts.py Outdated
Comment thread domains/java/explore/MethodMetricsJavaExploration.ipynb Outdated
Comment thread .github/workflows/internal-check-notebooks.yml Outdated
Comment thread domains/external-dependencies/explore/ExternalDependenciesJava.ipynb Outdated
Comment thread pyproject.toml
@JohT JohT marked this pull request as ready for review May 16, 2026 20:05
@JohT JohT force-pushed the feature/improve-plots branch 3 times, most recently from 5a0913f to 5a0c57f Compare May 17, 2026 06:31
@JohT JohT requested a review from Copilot May 17, 2026 08:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

domains/external-dependencies/externalDependencyCharts.py:535

  • Removing the top20 = ...head(20) pre-slicing here (and in all the other save_pie_chart_pair callsites below in this function and in generate_typescript_charts) means save_pie_chart_pair now receives the full overall/spread datasets, which can contain hundreds of rows. The chart relies entirely on group_small_values_into_others's percentage threshold to keep the pie readable. Please verify on a large project (e.g. AxonFramework) that the resulting pie charts and legends remain legible — without the explicit top-20 cap, charts with many low-but-not-tiny-percentage slices can become unreadable. If acceptable visually, consider documenting that the "Top N" naming in chart_name_prefix (Java_Top_external_packages_by_types, etc.) no longer reflects a strict top-N selection.

    # ── Top external packages (Table 1 equivalent) ────────────────────────────
    if not overall_data.empty:
        save_pie_chart_pair(
            source_data=overall_data,
            value_column="numberOfExternalCallerTypes",
            name_column="externalPackageName",
            chart_name_prefix="Java_Top_external_packages_by_types",
            primary_threshold_percent=0.7,
            report_directory=report_directory,
            verbose=verbose,
        )
        save_pie_chart_pair(
            source_data=overall_data,
            value_column="numberOfExternalCallerPackages",
            name_column="externalPackageName",
            chart_name_prefix="Java_Top_external_packages_by_packages",
            primary_threshold_percent=0.7,
            report_directory=report_directory,
            verbose=verbose,
        )

    # ── Second-level package grouping (Table 2 equivalent) ────────────────────
    if not second_level_overall_data.empty:
        save_pie_chart_pair(
            source_data=second_level_overall_data,
            value_column="numberOfExternalCallerTypes",
            name_column="externalSecondLevelPackageName",
            chart_name_prefix="Java_Top_second_level_packages_by_types",
            primary_threshold_percent=0.7,
            report_directory=report_directory,
            verbose=verbose,
        )
        save_pie_chart_pair(
            source_data=second_level_overall_data,
            value_column="numberOfExternalCallerPackages",
            name_column="externalSecondLevelPackageName",
            chart_name_prefix="Java_Top_second_level_packages_by_packages",
            primary_threshold_percent=0.7,
            report_directory=report_directory,
            verbose=verbose,
        )

    # ── Most spread external packages (Table 3 equivalent) ────────────────────
    if not spread_data.empty:
        save_pie_chart_pair(
            source_data=spread_data,
            value_column="sumNumberOfTypes",
            name_column="externalPackageName",
            chart_name_prefix="Java_Most_spread_packages_by_types",
            primary_threshold_percent=0.5,
            report_directory=report_directory,
            verbose=verbose,
        )
        save_pie_chart_pair(
            source_data=spread_data,
            value_column="sumNumberOfPackages",
            name_column="externalPackageName",
            chart_name_prefix="Java_Most_spread_packages_by_packages",
            primary_threshold_percent=0.5,
            report_directory=report_directory,
            verbose=verbose,
        )

    # ── Most spread second-level packages (Table 4 equivalent) ────────────────
    if not second_level_spread_data.empty:
        save_pie_chart_pair(
            source_data=second_level_spread_data,
            value_column="sumNumberOfTypes",
            name_column="externalSecondLevelPackageName",
            chart_name_prefix="Java_Most_spread_second_level_packages_by_types",
            primary_threshold_percent=0.5,
            report_directory=report_directory,
            verbose=verbose,
        )
        save_pie_chart_pair(
            source_data=second_level_spread_data,
            value_column="sumNumberOfPackages",
            name_column="externalSecondLevelPackageName",
            chart_name_prefix="Java_Most_spread_second_level_packages_by_packages",
            primary_threshold_percent=0.5,
            report_directory=report_directory,

Comment thread domains/external-dependencies/explore/ExternalDependenciesJava.ipynb Outdated
Comment thread domains/node-embeddings/explore/NodeEmbeddingsJavaExploration.ipynb Outdated
Comment thread domains/java/javaCharts.py Outdated
Comment thread domains/java/javaCharts.py Outdated
Comment thread .github/workflows/internal-check-notebooks.yml
Comment thread .github/workflows/internal-check-notebooks.yml
@JohT JohT force-pushed the feature/improve-plots branch from 346969a to 9347997 Compare May 17, 2026 09:50
@JohT JohT merged commit e7db08c into main May 17, 2026
14 checks passed
@JohT JohT deleted the feature/improve-plots branch May 17, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants