This repository provides an automated code graph analysis pipeline built on jQAssistant and Neo4j. It supports Java and experimental TypeScript analysis, capturing both the structure and evolution of your code base.
Ever wondered which libraries matter most, how your modules build on each other, which parts have few contributors, which files change together, or where structural anomalies emerge?
This project helps uncover such patterns through graph-based analysis, visualization, and machine learning β offering hundreds of expert-level reports for deep code insights.
Curious? Explore the examples at code-graph-analysis-examples and get started with GETTING_STARTED.md π
- Analyze static code structure as a graph
- Supports Java Code Analysis
- Supports Typescript Code Analysis (experimental)
- Fully automated pipeline for Java from tool installation to report generation
- Fully automated pipeline for Typescript from tool installation to report generation
- Fully automated local run
- Easily integrable into your continuous integration pipeline
- More than 200 CSV reports for dependencies, metrics, cycles, annotations, algorithms and many more
- Python generated charts for dependencies, metrics, visibility and many more
- Markdown summary reports for anomalies, archetypes, git history and many more
- Anomaly detection powered by unsupervised machine learning and explainable AI
- Graph structure visualization
- Automated reference document generation
- Runtime and library independent automation using shell scripts
- Tested on MacOS (zsh), Linux (bash) and Windows (Git Bash)
- Comprehensive list of Cypher queries
- Example analysis for AxonFramework
- Example analysis for react-router
- May 2026: Version 4.0.0 introduces independently-runnable analysis domains. Select specific domains via
--domainor exclude with--exclude-domain. No more monolithic execution. See MIGRATION.md for details. - November 2025: Removed deprecated (since version 2.x) "graph-visualization" node package
- November 2025: Treemap charts for anomalies and archetypes
- October 2025: Graph visualizations for anomaly archetypes
- October 2025: Anomaly archetypes with markdown summary
- August 2025: Association rule mining for co-changing files in git history
- August 2025: Anomaly detection powered by unsupervised machine learning and explainable AI
- May 2025: Migrated to Neo4j 2025.x and Java 21.
The repository is organized first by problem space. Most functionality lives in self-contained domains under domains, each bundling the scripts, Cypher queries, templates, and exploratory notebooks for one analysis area.
The report types Csv, Python, Markdown, and Visualization are secondary execution modes selected via --report. They cut across domains, while --domain narrows a run to one domain. Not every domain implements every report type.
If you think in architecture terms: domains are the vertical slices, report types are the cross-cutting execution modes.
By default, three compute-intensive domains are deactivated to reduce analysis time: anomaly-detection, node-embeddings, and graph-algorithms. Activate them individually via --domain or include them with --exclude-domain "" to run all domains.
| Domain | Description | Java Example | TypeScript Example | Notebooks | Example Chart |
|---|---|---|---|---|---|
| Anomaly Detection | Machine-learning-supported structural anomaly detection | AxonFramework | react-router | Explore | Anomalies |
| Archetypes | Structural roles: authority, bottleneck, and hub | AxonFramework | react-router | β | Treemap |
| Cyclic Dependencies | Cycle analysis for Java artifacts, packages, and TypeScript modules | AxonFramework | react-router | Java , TypeScript | Cycle Graph |
| External Dependencies | Usage of external libraries, packages, modules, and namespaces | AxonFramework | react-router | Java , TypeScript | Most Spread Packages |
| Git History | Change frequency, co-change patterns, authorship, and repository evolution | AxonFramework | react-router | General , Correlation | Co-Changing Files |
| Graph Algorithms | Centrality, communities, similarity, and other Graph Data Science results | AxonFramework | react-router | β | β |
| Internal Dependencies | Internal structure, path finding, topological order, OOD metrics, visibility metrics, and word clouds | AxonFramework | react-router | Java , TypeScript | Code Wordcloud |
| Java | Java code quality, method metrics, annotations, and artifact dependency analysis | AxonFramework | β | Method Metrics | Artifact Dependencies |
| Node Embeddings | Graph embeddings and 2D projections for structural exploration | AxonFramework | react-router | Java , TypeScript | Package Embeddings 2D |
| Overview | High-level project structure, composition, counts, and complexity distributions | AxonFramework | react-router | Java , TypeScript | Packages Per Artifact |
- Neo4j Management - Neo4j setup, configuration, start, stop, and memory profile management. Usually used indirectly through analyze.sh.
Here is a curated overview of report examples and exploratory notebooks from code-graph-analysis-examples. These examples are grouped by user-facing output, not by domain.
- External Dependencies contains detailed information about external library usage (Notebook).
- Git History contains information about the git history of the analyzed code (Notebook).
- Internal Dependencies is based on Analyze java package metrics in a graph database (Notebook).
- Cyclic Dependencies contains information about cyclic dependencies in the analyzed code (Notebook).
- Java Method Metrics shows how the effective number of lines of code and the cyclomatic complexity are distributed across the methods in the code (Notebook).
- Node Embeddings shows how to generate node embeddings and to further reduce their dimensionality to be able to visualize them in a 2D plot (Notebook).
- Object Oriented Design Quality Metrics is based on OO Design Quality Metrics by Robert Martin (Notebook).
- Overview contains overall statistics and details about methods and their complexity. (Notebook).
- Visibility Metrics (Notebook).
- Wordcloud contains a visual representation of package and class names (Notebook).
- Java Archetypes Treemap (Python Script)
These examples show selected outputs powered by Neo4j's Graph Data Science Library across several domains. For a complete list, see the CSV Cypher Query Report Reference.
- Centrality with Page Rank (Source Script)
- Community Detection with Leiden (Source Script)
- Node Embeddings with HashGNN (Source Script)
- Path Finding with all pairs shortest path (Source Script)
- Similarity with Jaccard (Source Script)
- Topology Sort (Source Script)
Here are some fully automated graph visualizations utilizing GraphViz from code-graph-analysis-examples:
- Java Artifact Build Levels (Query, Source Script)
- Java Artifact Longest Path Contributors (Query, Source Script)
- Java Package Top #1 Authority Archetype and contributing packages (Query, Source Script)
- Analyze java dependencies with jQAssistant
- Analyze java package metrics in a graph database (Part 2)
- Unleashing the Power of Graphs in Java Code Structure Analysis - Engineering Kiosk Alps Meetup, December 2023
- How anomalous is your code? - AI Meetup Austria, February 2026
Run scripts/checkCompatibility.sh to check if all required dependencies are installed and available in your environment.
-
Java 21 is required since Neo4j 2025.01. See also Changes from Neo4j 5 to 2025.x.
-
Java 17 is required for Neo4j 5.
-
On Windows it is recommended to use the git bash provided by git for windows.
-
jq the "lightweight and flexible command-line JSON processor" needs to be installed. Latest releases: https://github.com/jqlang/jq/releases/latest. Check using
jq --version. -
Set environment variable
NEO4J_INITIAL_PASSWORDto a password of your choice. For example:export NEO4J_INITIAL_PASSWORD=neo4j_password_of_my_choiceTo run Jupyter notebooks, create an
.envfile in the folder from where you open the notebook containing for example:NEO4J_INITIAL_PASSWORD=neo4j_password_of_my_choice
- Python is required for Python reports.
- uv is the primary Python package manager (default). Install from https://docs.astral.sh/uv/getting-started/installation/.
- Conda is a supported optional path. Use for example Miniconda or Anaconda (Recommended for Windows). Set
PYTHON_PACKAGE_MANAGER=condato activate. - For Conda on Windows, add this line to your
~/.bashrc:/c/ProgramData/Anaconda3/etc/profile.d/conda.sh. Runconda initin Git Bash as administrator.
-
Please follow the description on how to create a json file with the static code information of your Typescript project here: https://github.com/jqassistant-plugin/jqassistant-typescript-plugin
This could be as simple as running the following command in your Typescript project:npx --yes @jqassistant/ts-lce
-
The cloned repository or source project needs to be copied into the directory called
sourcewithin the analysis workspace, so that it will also be picked up during scan by resetAndScan.sh and optional importGit.sh.
See GETTING_STARTED.md on how to get started on your local machine.
See INTEGRATION.md on how to integrate code analysis in your continuous integration pipeline. Currently (2025), only GitHub Actions are supported.
Source: analysis_process_graph.gv
The analysis script analyze.sh orchestrates 5 phases: Setup (Neo4j/jQAssistant), Scan & Analysis (code scanning), Prepare (graph enrichment), Report Generation (domain-specific reports), and Cleanup. See COMMANDS.md for CLI options and detailed flow.
The Code Structure Analysis Pipeline utilizes GitHub Actions to automate the whole analysis process:
- Use GitHub Actions Linux Runner
- Checkout GIT Repository
- Setup Java
- Setup uv β Primary Python package manager
- Setup Python with Conda package manager Mambaforge β Optional alternative
- Download artifacts and optionally source code that contain the code to be analyzed scripts/downloader
- Setup Neo4j Graph Database (analysis.sh)
- Setup jQAssistant for Java and Typescript analysis (analysis.sh)
- Start Neo4j Graph Database (analysis.sh)
- Generate CSV Reports scripts/reports using the command line JSON parser jq
- Uses Neo4j Graph Data Science for community detection, centrality, similarity, node embeddings and topological sort (analysis.sh)
- Generate Python and Markdown reports using these libraries specified in the conda-environment.yml:
- HPCC-Systems (High Performance Computing Cluster) Web-Assembly (JavaScript) containing a wrapper for GraphViz to visualize graph structures.
- GraphViz for CLI Graph Visualization
- Check links in markdown documentation (GitHub workflow) uses markdown-link-check.
Big shout-out π£ to all the creators and contributors of these great libraries π. Projects like this wouldn't be possible without them. Feel free to create an issue if something is missing or wrong in the list.
COMMANDS.md contains further details on commands and how to do a manual setup.
CSV_REPORTS.md lists all CSV Cypher query result reports inside the results directory. It can be generated as described in Generate CSV Report Reference.
IMAGES.md lists all PNG images inside the results directory. It can be generated as described in Generate Image Reference.
SCRIPTS.md lists all shell scripts of this repository including their first comment line as a description. It can be generated as described in Generate Script Reference.
CYPHER.md lists all Cypher queries of this repository including their first comment line as a description. It can be generated as described in Generate Cypher Reference.
Cypher is Neo4jβs graph query language that lets you retrieve data from the graph.
ENVIRONMENT_VARIABLES.md contains all environment variables that are supported by the scripts including default values and description. It can be generated as described in Generate Environment Variable Reference.
CHANGELOG.md contains all changes of this repository.
-
How can i run an analysis locally?
π Check the prerequisites. π See Start an analysis in the Commands Reference. π To get started from scratch see GETTING_STARTED.md. -
How can i explore the Graph manually? π After analysis start Neo4j and open the Neo4j Web UI (
http://localhost:7474/browser). -
How can i add a CSV report to the pipeline?
π Put your new cypher query into the cypher directory or a suitable (new) sub directory.
π Create a new CSV report script in a domain directory under domains or in scripts/reports. Take for example overviewCsv.sh as a reference.
π The script will automatically be included because of the directory and its name ending with "Csv.sh". -
How can i analyze a different code basis automatically?
π Create a new download script like the ones in the scripts/downloader directory. Take for example downloadAxonFramework.sh as a reference for Java projects and downloadReactRouter.sh as a reference for Typescript projects. π After downloading, run analyze.sh. You can find these steps also in the pipeline as a reference. -
How can i trigger a full re-scan of all artifacts?
π Delete the fileartifactsChangeDetectionHash.txtin theartifactsdirectory. π Delete the filetypescriptFileChangeDetectionHashFile.txtin thesourcedirectory to additionally re-scan Typescript projects. -
How can I disable git log data import?
π Set environment variableIMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENTtonone. Example:export IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none"
π Alternatively prepend your command with
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none":IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none" ./../../scripts/analysis/analyze.shπ An in-between option would be to only import monthly aggregated changes using
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated":IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated" ./../../scripts/analysis/analyze.sh -
What changed in version 4 regarding report generation?
π Jupyter notebook execution, PDF generation (ENABLE_JUPYTER_NOTEBOOK_PDF_GENERATION) and--report Jupyterhave been removed.
π Use--report Allto generate every report (recommended).--report Markdownproduces Markdown summaries but is not always a drop-in replacement for the removed Jupyter pipeline on a fresh workspace: some Markdown summaries depend on prior CSV or Python outputs (for example, domains/overview/summary/overviewSummary.sh and domains/external-dependencies/summary/externalDependenciesSummary.sh). To get complete Markdown reports on a fresh workspace either run--report Csvor--report Pythonfor the affected domains first, or use--report All.
π The 25explore/*.ipynbnotebooks indomains/*/explore/remain available for interactive exploration but are no longer executed automatically.
πnbconvertis no longer required for automatic report generation and can be uninstalled. If you still want to open theexplore/*.ipynbnotebooks interactively you may still keep (or install)jupyterseparately. -
How can I increase the heap memory when scanning large Typescript projects?
π Use the environment variable TYPESCRIPT_SCAN_HEAP_MEMORY in megabyte (default = 4096):TYPESCRIPT_SCAN_HEAP_MEMORY=16384 ./../../scripts/analysis/analyze.sh
-
How can I continue on errors when scanning Typescript projects instead of cancelling the whole analysis?
π Use the profileNeo4j-latest-continue-on-scan-errors(default =Neo4j-latest):./../../scripts/analysis/analyze.sh --profile Neo4j-latest-continue-on-scan-errors
-
How can I reduce the memory (RAM) consumption?
π Use the profileNeo4j-latest-low-memory(default =Neo4j-latest):./../../scripts/analysis/analyze.sh --profile Neo4j-latest-low-memory
-
How can I increase the memory (RAM) consumption?
π Use the profileNeo4j-latest-high-memory(default =Neo4j-latest):./../../scripts/analysis/analyze.sh --profile Neo4j-latest-high-memory
-
How can i increase the memory (RAM) consumption afterwards, when the setup is already done?
π Simply runuseNeo4jHighMemoryProfile.shin your analysis working directory, or:./../../domains/neo4j-management/useNeo4jHighMemoryProfile.sh
- code-graph-analysis-examples
- Bite-Sized Neo4j for Data Scientists
- The Story behind Russian Twitter Trolls
- Graphs for Data Science and Machine Learning
- Modularity
- Graph Data Science Centrality Algorithms
- Graph Data Science Community Detection Algorithms
- Graph Data Science Community Similarity Algorithms
- Graph Data Science Community Topological Sort Algorithm
- Node embeddings for Beginners
