|
1 | 1 | # SimKit Experiments with Neo4j & Python |
2 | 2 |
|
3 | | -This repository demonstrates how to run spectral clustering experiments using both the SimKit plugin (loaded into Neo4j) and scikit-learn. It includes: |
4 | | -- A Dockerfile to build a Neo4j container with the SimKit plugin. |
5 | | -- The `experiments_2.py` script which runs experiments against the Neo4j instance. |
| 3 | +This repository demonstrates how to run spectral clustering experiments using both the SimKit plugin (loaded into Neo4j) and scikit-learn. It also includes experiments using Neo4j Graph Data Science (GDS) algorithms. |
| 4 | + |
| 5 | +## Features |
| 6 | +- Dockerized Neo4j setup with the SimKit plugin. |
| 7 | +- Scripts to compare clustering results using: |
| 8 | +- SimKit (custom Neo4j plugin) |
| 9 | +- scikit-learn |
| 10 | +- Neo4j GDS library |
| 11 | +- Batch and timing experiments for performance comparison. |
| 12 | + |
| 13 | +## Contents |
| 14 | +- Dockerfile and docker-compose.yml: Builds and runs a Neo4j instance with the SimKit plugin. |
| 15 | +- requirements.txt: Python dependencies. |
| 16 | +- SimKit-0.1.1.jar: SimKit plugin (must be compatible with the Neo4j version used). |
| 17 | +- Python scripts: |
| 18 | + - experiment_gds.py: Runs k-means clustering using Neo4j GDS. |
| 19 | + - experiments_simkit-0.1.1.py: Runs and times clustering using SimKit and scikit-learn. |
6 | 20 |
|
7 | 21 | ## Prerequisites |
8 | | - |
9 | | -- Docker must be installed on your system. |
| 22 | +- Docker installed on your system. |
10 | 23 | - Python 3.x and pip installed. |
11 | | -- Ensure your system has enough memory (Neo4j can be memory intensive). |
12 | | -- Place the `simkit.jar` plugin file (compatible with your Neo4j version) in the same folder as the Dockerfile. |
13 | | - |
14 | | -## Building and Running the Neo4j Docker Container |
15 | | - |
16 | | -1. **Build the Docker Image** |
17 | | - |
18 | | - Open a terminal in the directory containing the Dockerfile and `simkit.jar` and run: |
19 | | - |
20 | | - ```bash |
21 | | - docker build -t my-neo4j . |
22 | | - ``` |
23 | | - |
24 | | - This builds a Docker image named `my-neo4j` that includes the SimKit plugin. |
25 | | - |
26 | | -2. **Run the Docker Container** |
27 | | - |
28 | | - Start the container with: |
29 | | - |
30 | | - ```bash |
31 | | - docker run --name neo4j -p 7687:7687 -p 7474:7474 -d my-neo4j |
32 | | - ``` |
33 | | - |
34 | | - - **Ports:** |
35 | | - - `7687` is the Bolt port for Neo4j. |
36 | | - - `7474` is the HTTP port (Neo4j Browser). |
37 | | - |
38 | | -3. **Verify the Container** |
39 | | - |
40 | | - - Check logs for any errors (especially plugin errors): |
41 | | - |
42 | | - ```bash |
43 | | - docker logs neo4j |
44 | | - ``` |
45 | | - |
46 | | - - Open [http://localhost:7474](http://localhost:7474) in your browser to access the Neo4j Browser and verify that the SimKit procedures are available (e.g., try a test query like `RETURN simkit.experimental_spectralClustering({ ... })`). |
47 | | - |
48 | | -## Running the Experiments |
49 | | - |
50 | | -The `experiments_2.py` script runs spectral clustering experiments using both SimKit (via Neo4j procedures) and scikit-learn spectral clustering. |
51 | | - |
52 | | -1. **Install Python Dependencies** |
53 | | - |
54 | | - The script will attempt to install missing packages automatically. Alternatively, install manually: |
55 | | - |
56 | | - ```bash |
57 | | - pip install neo4j pandas psutil tqdm scikit-learn scipy |
58 | | - ``` |
59 | | - |
60 | | -2. **Prepare Datasets** |
61 | | - |
62 | | - Place your dataset CSV files under the `datasets/` directory. The script expects file names such as `iris.csv`, `cora_nodes.csv`, `cora_edges.csv`, etc. |
63 | | - |
64 | | -3. **Run the Experiment Script** |
65 | | - |
66 | | - Ensure the Neo4j container is running, then execute: |
67 | | - |
68 | | - ```bash |
69 | | - python experiments_2.py |
70 | | - ``` |
71 | | - |
72 | | - The script will: |
73 | | - - Delete existing nodes and indexes in Neo4j. |
74 | | - - Create feature/graph nodes from the datasets. |
75 | | - - Run experiments using both SimKit and scikit-learn. |
76 | | - - Save results as CSV files in a `results/` directory. |
| 24 | +- Memory: Ensure your system has enough RAM (Neo4j can be memory-intensive). |
| 25 | +- Place the simkit.jar plugin (compatible with your Neo4j version) in the same directory as the Dockerfile. |
77 | 26 |
|
78 | | -## Troubleshooting |
| 27 | +## Setup Instructions |
| 28 | +1. Clone the repository: |
79 | 29 |
|
80 | | -- **Container Exits Early:** |
81 | | - Check the container logs with: |
| 30 | + ```bash |
| 31 | + git clone https://github.com/yourusername/simkit-experiments.git |
| 32 | + cd simkit-experiments |
| 33 | + ``` |
82 | 34 |
|
83 | | - ```bash |
84 | | - docker logs neo4j |
85 | | - ``` |
| 35 | +2. Install Python dependencies: |
| 36 | + ```bash |
| 37 | + pip install -r requirements.txt |
| 38 | + ``` |
86 | 39 |
|
87 | | - Review for errors related to plugin incompatibility, configuration issues, or memory constraints. |
| 40 | +3. Start the Neo4j database: |
| 41 | + ```bash |
| 42 | + docker compose up |
| 43 | + ``` |
88 | 44 |
|
89 | | -- **SimKit Procedure Not Found:** |
90 | | - Ensure the `simkit.jar` file is correctly placed and compatible with the Neo4j version used. |
| 45 | + This will build and start the Neo4j container with SimKit. |
91 | 46 |
|
92 | | -- **Python Errors:** |
93 | | - Verify that all dataset files exist and have the expected schema. |
94 | 47 |
|
95 | | -## Cleanup |
| 48 | +4. Run experiments: |
| 49 | + - Run GDS clustering experiments on Neo4j: |
| 50 | + ```bash |
| 51 | + python experiment_gds.py |
| 52 | + ``` |
96 | 53 |
|
97 | | -To stop and remove the Docker container: |
| 54 | + - Run SimKit and scikit-learn timing experiments: |
| 55 | + ```bash |
| 56 | + python experiments_simkit-0.1.1.py |
| 57 | + ``` |
98 | 58 |
|
99 | | -```bash |
100 | | -docker stop neo4j |
101 | | -docker rm neo4j |
102 | | -``` |
| 59 | +## Notes |
| 60 | +- Ensure the Neo4j container is fully up before starting any experiments. |
| 61 | +- The scripts assume the Neo4j instance is accessible at the configured bolt:// address and uses default credentials (neo4j/neo4j by default; change if modified). |
| 62 | +- You may need to load or generate a sample graph dataset in Neo4j before running experiments. |
103 | 63 |
|
104 | | -To remove the Docker image: |
| 64 | +## License |
105 | 65 |
|
106 | | -```bash |
107 | | -docker rmi my-neo4j |
108 | | -``` |
| 66 | +This project is licensed under the Apache License 2.0. |
0 commit comments