- Start an Analysis
- Command Line Options
- Notes
- Examples
- Start an analysis with CSV reports only
- Start an analysis with Python reports only
- Start an analysis without importing git log data
- Only run setup and explore the Graph manually
- Only run the reports of one specific domain
- Only run the CSV reports of one specific domain
- Rerun reports without restarting Neo4j
- Run all reports except slow/optional domains
- Run all domains except specific ones
- Generate Markdown References
- Validate Links in Markdown
- Manual Setup
- Setup Neo4j Graph Database
- Change Neo4j configuration template
- Start Neo4j Graph Database
- Setup jQAssistant Java Code Analyzer
- Download Maven Artifacts to analyze
- Sort out jar files containing external libraries
- Download Typescript project to analyze
- Reset the database and scan the java artifacts
- Import git data
- Database Queries
- Stop Neo4j
- References
- Other Commands
Before starting an analysis, setup your analysis as described in the Getting Started guide. An analysis is then started with the script analyze.sh. To run all analysis steps simple execute the following command:
./../../scripts/analysis/analyze.shHint: Within the analysis workspace directory you can simply run analyze.sh directly without the ../../ prefix since the script is also available in the analysis workspace.
👉 See scripts/examples/analyzeAxonFramework.sh as an example script that combines all the above steps for a Java Project.
👉 See scripts/examples/analyzeReactRouter.sh as an example script that combines all the above steps for a Typescript Project.
👉 See Code Structure Analysis Pipeline on how to do this within a GitHub Actions Workflow.
The analyze.sh command comes with these command line options:
-
--report Csvonly generates CSV reports. This speeds up the report generation and doesn't depend on Python or any other related dependencies. The default value isAllto generate all reports.DatabaseCsvExportexports the whole graph database as a CSV file (performance intense, check if there are security concerns first). -
--profile Neo4jv4uses the older long term support (june 2023) version v4.4.x of Neo4j and suitable compatible versions of plugins and JQAssistant. Without specifying a profile, the newest versions will be used. Other profiles can be found in the directory scripts/profiles. -
--profile Neo4jv5uses the older long term support (march 2025) version v5.26.x of Neo4j and suitable compatible versions of plugins and JQAssistant. Without specifying a profile, the newest versions will be used. Other profiles can be found in the directory scripts/profiles. -
--profile Neo4j-latest-continue-on-scan-errorsis based on the default profile (Neo4j-latest) but uses the jQAssistant configuration template template-neo4jv5-jqassistant-continue-on-error.yaml to continue on scan error instead of failing fast. This is temporarily useful when there is a known error that needs to be ignored. It is still recommended to use the default profile and fail fast if there is something wrong. Other profiles can be found in the directory scripts/profiles. -
--profile Neo4j-latest-low-memoryis based on the default profile (Neo4j-latest) but uses only half of the memory (RAM) as configured in template-neo4j-low-memory.conf. This is useful for the analysis of smaller codebases with less resources. Other profiles can be found in the directory scripts/profiles. -
--profile Neo4j-latest-low-memory-continue-on-scan-errorsis based on the default profile (Neo4j-latest) but uses only half of the memory (RAM) as configured in template-neo4j-low-memory.conf and the jQAssistant configuration template template-neo4jv5-jqassistant-continue-on-error.yaml to continue on scan error instead of failing fast. This is temporarily useful when there is a known error that needs to be ignored. It is still recommended to use the default profile and fail fast if there is something wrong. Other profiles can be found in the directory scripts/profiles. -
--profile Neo4j-latest-high-memoryis based on the default profile (Neo4j-latest) but uses more memory (RAM) as configured in template-neo4j-high-memory.conf. This is useful for the analysis of larger codebases with more resources. Other profiles can be found in the directory scripts/profiles. -
--exploreactivates the "explore" mode where no reports are generated. Furthermore, Neo4j won't be stopped at the end of the script and will therefore continue running. This makes it easy to just set everything up but then use the running Neo4j server to explore the data manually. -
--keep-runningskips the Neo4j stop step at the end of the script. Neo4j keeps running after analysis completes. Useful when running reports repeatedly (e.g. during development) to avoid the overhead of stopping and restarting Neo4j each time. Note:--exploreimplies--keep-running; combining both is redundant. -
--domain anomaly-detectionselects a single analysis domain (a subdirectory of domains/) to run reports for, following a vertical-slice approach. When set, only that domain's report scripts run; core reports fromscripts/reports/and other domains are skipped. The domain option composes with--reportto further narrow down which reports are generated, e.g.--domain anomaly-detection --report Csv. When not specified, all domains and reports run unchanged. The selected domain name is passed to report compilation scripts via the environment variableANALYSIS_DOMAIN. Available domains can be found in the domains/ directory. -
--exclude-domain anomaly-detection,node-embeddingsspecifies a comma-separated list of domain names to skip during report generation. By default (when neither--domainnor--exclude-domainis set), slow/optional domains are skipped:anomaly-detection,node-embeddings, andgraph-algorithms. When--domainis specified, no domains are excluded by default (only the selected domain runs). Pass an empty string--exclude-domain ""to override defaults and skip nothing. The domain names must match subdirectories under domains/. Unrecognized domains are warned about but do not cause errors. The list is exported as the environment variableANALYSIS_DOMAINS_TO_SKIPto all report compilation scripts.
- Be sure to use Java 21 for Neo4j v2025, Java 17 for v5 and Java 11 for v4. Details see Neo4j System Requirements / Java.
- Use your own initial Neo4j password
- For more details have a look at the script analyze.sh
If only the CSV reports are needed, that are the result of Cypher queries and don't need any further dependencies (like Python) the analysis can be speeded up with:
./../../scripts/analysis/analyze.sh --report CsvIf you only need Python reports, e.g. to get expressive charts, you can run the Python reports independently with:
./../../scripts/analysis/analyze.sh --report PythonTo speed up analysis and get a smaller data footprint you can switch of git log data import of the "source" directory (if present) with IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none" as shown below or choose IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated" to reduce data size by only importing monthly grouped changes instead of all commits.
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="none" ./../../scripts/analysis/analyze.shTo prepare everything for analysis including installation, configuration and preparation queries to explore the graph manually without report generation use this command:
./../../scripts/analysis/analyze.sh --exploreTo only run the reports of a single analysis domain (vertical slice, no additional Python or Node.js dependencies for core reports):
./../../scripts/analysis/analyze.sh --domain anomaly-detectionTo further narrow down to only one report type within a specific domain:
./../../scripts/analysis/analyze.sh --domain anomaly-detection --report CsvWhen iterating on reports during development, use --keep-running to skip the Neo4j stop step and avoid the overhead of restarting it on the next run:
./../../scripts/analysis/analyze.sh --domain git-history --report Csv --keep-runningTo run analysis with all domains enabled (overriding the default exclusions of anomaly-detection, node-embeddings, and graph-algorithms):
./../../scripts/analysis/analyze.sh --exclude-domain ""To skip specific domains during analysis (e.g., skip anomaly-detection and node-embeddings but run everything else):
./../../scripts/analysis/analyze.sh --exclude-domain "anomaly-detection,node-embeddings"Execute the script generateCypherReference.sh from the root directory with the following command:
./scripts/documentation/generateCypherReference.sh
Change into the scripts directory e.g. with cd scripts and then execute the script generateScriptReference.sh with the following command:
./documentation/generateScriptReference.sh
Change into the scripts directory e.g. with cd scripts and then execute the script generateEnvironmentVariableReference.sh with the following command:
./documentation/generateEnvironmentVariableReference.sh
The following command shows how to use markdown-link-check to for example check the links in the README.md file:
npx --yes markdown-link-check --quiet --progress --config=markdown-lint-check-config.json README.md COMMANDS.md GETTING_STARTED.md INTEGRATION.md CHANGELOG.md
The manual setup is only documented for completeness. It isn't needed since the analysis also covers download, installation and configuration of all needed tools.
If any of the script are not allowed to be executed use chmod +x ./scripts/ followed by the script file name to grant execution.
Use setupNeo4j.sh to download Neo4j and install the plugins APOC and Graph Data Science.
This script requires the environment variable NEO4J_INITIAL_PASSWORD to be set. It sets the initial password with a temporary NEO4J_HOME environment variable to not interfere with a possibly globally installed Neo4j installation.
Use configureNeo4j.sh to apply a different Neo4j configuration template from the domains/neo4j-management/configuration directory. This can be useful to optimize Neo4j for different workloads. Example:
NEO4J_CONFIG_TEMPLATE=template-neo4j-high-memory.conf ./domains/neo4j-management/configureNeo4j.shHint: In case you want to switch to the high memory profile as in the example, there is a simpler solution. Just run useNeo4jHighMemoryProfile.sh from the analysis workspace directory which will set the environment variable NEO4J_CONFIG_TEMPLATE and run configureNeo4j.sh for you.
Use startNeo4j.sh to start the locally installed Neo4j Graph database.
It runs the script with a temporary NEO4J_HOME environment variable to not interfere with a possibly globally installed Neo4j installation.
Hint: Within the analysis workspace directory you can simply run startNeo4j.sh directly without the ../../ prefix since the script is also available in the analysis workspace.
Use setupJQAssistant.sh to download jQAssistant.
Use downloadMavenArtifact.sh with the following mandatory options to download a Maven artifact into the artifacts directory:
-g <maven group id>-a <maven artifact name>-v <maven artifact version>-t <maven artifact type (optional, defaults to jar)>-d <target directory for the downloaded file (optional, defaults to "artifacts")>
After collecting all the Java artifacts it might be needed to sort out external libraries you don't want to analyze directly. For that you can use sortOutExternalJavaJarFiles.sh. It needs to be started in the directory of the jar files ("artifacts") of you analysis workspace and will create a new directory called "ignored-jars" besides the "artifacts" directory so that those jars don't get analyzed.
Here is an example that can be started from your temp analysis workspace and that will filter out all jar files that don't contain any org.neo4j package:
cd artifacts; ./../../../scripts/sortOutExternalJavaJarFiles.sh org.neo4j
Use downloadTypescriptProject.sh with the following options to download a Typescript project using git clone and prepare it for analysis:
--urlGit clone URL (required)--versionVersion of the project--tagTag to switch to after "git clone" (optional, default = version)--projectName of the project/repository (optional, default = clone url file name without .git extension)--packageManagerOne of "npm", "pnpm" or "yarn". (optional, default = "npm")
Here is an example:
./../../downloadTypescriptProject.sh \
--url https://github.com/remix-run/react-router.git \
--version 6.24.0 \
--tag "react-router@6.24.0" \
--packageManager pnpmUse resetAndScan.sh to scan the local artifacts directory with the previously downloaded Java artifacts and write the data into the local Neo4J Graph database using jQAssistant. It also uses some jQAssistant "concepts" to
enhance the data further with relationships between artifacts and packages.
Be aware that this script deletes all previous relationships and nodes in the local Neo4j Graph database.
Use importGit.sh to import git data into the Graph.
It uses git log to extract commits, their authors and the names of the files changed with them. These are stored in an intermediate CSV file and are then imported into Neo4j with the following schema:
(Git:Log:Author)-[:AUTHORED]->(Git:Log:Commit)->[:CONTAINS_CHANGED]->(Git:Log:File)
(Git:Log:Commit)-[:HAS_PARENT]->(Git:Log:Commit)
(Git:Repository)-[:HAS_COMMIT]->(Git:Log:Commit)
(Git:Repository)-[:HAS_AUTHOR]->(Git:Log:Author)
(Git:Repository)-[:HAS_FILE]->(Git:Log:File)👉Note: Commit messages containing [bot] are filtered out to ignore changes made by bots.
Instead of importing every single commit, changes can be grouped by month including their commit count. This is in many cases sufficient and reduces data size and processing time significantly. To do this, set the environment variable IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT to aggregated. If you don't want to set the environment variable globally, then you can also prepend the command with it like this (inside the analysis workspace directory contained within temp):
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENT="aggregated" ./../../domains/git-history/import/importGit.shHere is the resulting schema:
(Git:Log:Author)-[:AUTHORED]->(Git:Log:ChangeSpan)-[:CONTAINS_CHANGED]->(Git:Log:File)
(Git:Repository)-[:HAS_CHANGE_SPAN]->(Git:Log:ChangeSpan)
(Git:Repository)-[:HAS_AUTHOR]->(Git:Log:Author)
(Git:Repository)-[:HAS_FILE]->(Git:Log:File)The optional parameter --source directory-path-to-the-source-folder-containing-git-repositories can be used to select a different directory for the repositories. By default, the source directory within the analysis workspace directory is used. This command only needs the git history to be present. Therefore, git clone --bare is sufficient. If the source directory is also used for code analysis (like for Typescript) then a full git clone is of course needed. Additionally, if you want to focus on a specific version or branch, use --branch branch-name to checkout the branch and --single-branch to exclude other branches before importing the git log data.
IMPORT_GIT_LOG_DATA_IF_SOURCE_IS_PRESENTsupports the valuesnone,aggregated,fullandplugin(default). With it, you can switch off git import (none), import aggregated data for a smaller memory footprint (aggregated), import all commits with git log in a simple way (full) or let a plugin take care of git data (plugin=""=default) .
After git log data has been imported successfully, Add_RESOLVES_TO_relationships_to_git_files_for_Java.cypher is used to try to resolve the imported git file names to code files. This first attempt will cover most cases, but not all of them. With this approach it is, for example, not possible to distinguish identical file names in different Java jars from the git source files of a mono repo.
You can use List_unresolved_git_files.cypher to find code files that couldn't be matched to git file names and List_ambiguous_git_files.cypher to find ambiguously resolved git files. If you have any idea on how to improve this feel free to open an issue.
With cypher-shell CLI provided by Neo4j a query based on a file can simply be made with the following command.
Be sure to replace path/to/local/neo4j and password with your settings.
cat ./cypher/Get_Graph_Data_Science_Library_Version.cypher | path/to/local/neo4j/bin/cypher-shell -u neo4j -p password --format plainQuery parameter can be added with the option --param. Here is an example:
cat ./cypher/Get_Graph_Data_Science_Library_Version.cypher | path/to/local/neo4j/bin/cypher-shell -u neo4j -p password --format plain --param {a: 1}For a full list of options use the help function:
path/to/local/neo4j/bin/cypher-shell --helpUse executeQuery.sh to execute a Cypher query from the file given as an argument.
It uses curl and jq to access the HTTP API of Neo4j.
Here is an example:
./scripts/executeQuery.sh ./cypher/Get_Graph_Data_Science_Library_Version.cypherQuery parameters can be added as arguments after the file name. Here is an example:
./scripts/executeQuery.sh ./cypher/Get_Graph_Data_Science_Library_Version.cypher a=1The script executeQueryFunctions.sh contains functions to simplify the
call of executeQuery.sh for different purposes. For example, execute_cypher_summarized
prints out the results on the console in a summarized manner and execute_cypher_expect_results fails when there are no results.
The script also provides an API abstraction that defaults to HTTP, but can easily be switched to cypher-shell.
Query parameters can be added as arguments after the file name. Here is an example:
source "${SCRIPTS_DIR}/executeQueryFunctions.sh"
execute_cypher ./cypher/Get_Graph_Data_Science_Library_Version.cypher a=1Use stopNeo4j.sh to stop the locally running Neo4j Graph Database. It does nothing if the database is already stopped. It runs the script with a temporary NEO4J_HOME environment variable to not interfere with a possibly globally installed Neo4j installation.
Hint: Within the analysis workspace directory you can run stopNeo4j.sh directly without the ../../ prefix since the script is also directly available in the analysis workspace.
- Conda
- jQAssistant
- Bite-Sized Neo4j for Data Scientists
- Managing environments with Conda
- Neo4j - Download
- Neo4j - HTTP API
- How to Use Conda With Github Actions
- Older database download link (neo4j community)
ps -p $( lsof -t -i:7474 -sTCP:LISTEN )kill -9 $( lsof -t -i:7474 -sTCP:LISTEN )Reference: Neo4j memory estimation
NEO4J_HOME=tools/neo4j-community-4.4.20 tools/neo4j-community-4.4.20/bin/neo4j-admin memrec