|
1 | | -# StreamCPI: Framework for Streaming Graph Partitioning |
| 1 | +StreamCPI 1.00 |
2 | 2 | [](https://opensource.org/licenses/MIT) |
| 3 | +[](https://isocpp.org/) |
| 4 | +[](https://cmake.org/) |
| 5 | +[](https://github.com/KaHIP/CompressedStreamingGraphPartitioning) |
| 6 | +[](https://github.com/KaHIP/CompressedStreamingGraphPartitioning) |
| 7 | +[](https://github.com/KaHIP/CompressedStreamingGraphPartitioning/stargazers) |
| 8 | +[](https://github.com/KaHIP/CompressedStreamingGraphPartitioning/issues) |
| 9 | +[](https://github.com/KaHIP/CompressedStreamingGraphPartitioning/commits) |
| 10 | +[](https://github.com/KaHIP/homebrew-kahip) |
| 11 | +[](https://arxiv.org/abs/2410.07732) |
| 12 | +[](https://arxiv.org/abs/2410.07732) |
| 13 | +[](https://www.uni-heidelberg.de) |
| 14 | +===== |
| 15 | + |
| 16 | +<p align="center"> |
| 17 | + <img src="https://raw.githubusercontent.com/KaHIP/CompressedStreamingGraphPartitioning/master/logo/streamcpi-banner.png" alt="StreamCPI Banner" width="900"/> |
| 18 | +</p> |
| 19 | + |
| 20 | +**StreamCPI** is a framework for reducing the memory consumption of streaming graph partitioners by compressing block assignments using run-length encoding. Part of the [KaHIP](https://github.com/KaHIP) organization. |
| 21 | + |
| 22 | +| | | |
| 23 | +|:--|:--| |
| 24 | +| **What it solves** | Memory-efficient streaming graph partitioning for trillion-edge graphs on edge devices | |
| 25 | +| **Techniques** | Run-length compressed partition indices (CPI), modified Fennel scoring, batch-wise compression, external memory PQ | |
| 26 | +| **Interfaces** | CLI | |
| 27 | +| **Requires** | C++20, CMake 3.24+, MPI, OpenMP, Argtable | |
| 28 | + |
| 29 | +## Quick Start |
| 30 | + |
| 31 | +### Install via Homebrew |
| 32 | + |
| 33 | +```bash |
| 34 | +brew install KaHIP/kahip/streamcpi |
| 35 | +``` |
3 | 36 |
|
4 | | -## What is **StreamCPI**? |
5 | | -**StreamCPI** is a framework for reducing the memory consumption of streaming graph partitioners by **C**ompressing the array of block assignments |
6 | | -(**P**artition **I**ndices) used by such partitioners. In particular, StreamCPI utilizes run-length data compression to encode runs of repeating block assignments generated by the streaming partitioner on-the-fly. |
7 | | -In this framework, we offer a novel (semi-)dynamic compression vector that functions as a drop-in replacement to standard arrays, like std::vector in C++, used to store block assignments in streaming graph partitioners. |
| 37 | +### Or build from source |
8 | 38 |
|
9 | | -This repository contains the code to accompany our paper: *Adil Chhabra, Florian Kurpicz, Christian Schulz, Dominik Schweisgut, Daniel Seemaier. Partitioning Trillion Edge Graphs on Edge Devices. In SIAM Conference on Applied and Computational Discrete Algorithms (ACDA), to appear, 2025.* |
10 | | -You can find a freely accessible online technical report on arXiv: https://arxiv.org/abs/2410.07732. |
| 39 | +```bash |
| 40 | +git clone --recursive https://github.com/KaHIP/CompressedStreamingGraphPartitioning.git |
| 41 | +cd CompressedStreamingGraphPartitioning |
| 42 | +./compile.sh |
| 43 | +``` |
11 | 44 |
|
12 | | -## Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm? |
13 | | -Yes, if your streaming algorithm stores arrays with repeating values, you can greatly benefit from our compression vector which supports both append and access operations, and is very easy to integrate. The code and more details on how to use the compression vector |
14 | | -are provided in a seperate GitHub repository https://github.com/kurpicz/cpi. |
| 45 | +Alternatively, use the standard CMake build process: |
15 | 46 |
|
16 | | -## Which streaming partitioner does StreamCPI use? |
17 | | -In this repository, we build StreamCPI with a modified Fennel partitioning scoring function. However, StreamCPI can be inserted as a subroutine to reduce the memory footprint in any streaming graph |
18 | | -partitioning tool that requires to store block ids in a vector of size $\Theta(n)$. |
| 47 | +```bash |
| 48 | +mkdir build && cd build |
| 49 | +cmake .. -DCMAKE_BUILD_TYPE=Release |
| 50 | +make -j$(nproc) |
| 51 | +``` |
19 | 52 |
|
20 | | -## Installation Notes |
| 53 | +The resulting binaries are `deploy/stream_cpi` and `deploy/stream_cpi_generated`. |
21 | 54 |
|
22 | | -### Requirements |
| 55 | +### Run |
23 | 56 |
|
24 | | -* C++-14 ready compiler (g++ version 10+) |
25 | | -* CMake |
26 | | -* Scons (http://www.scons.org/) |
27 | | -* Argtable (http://argtable.sourceforge.net/) |
| 57 | +```bash |
| 58 | +# Partition a METIS graph into k blocks |
| 59 | +./deploy/stream_cpi <graph> --k=<number of blocks> |
28 | 60 |
|
29 | | -### Building StreamCPI |
| 61 | +# With full run-length compression (recommended) |
| 62 | +./deploy/stream_cpi <graph> --k=<number of blocks> --rle_length=0 |
30 | 63 |
|
31 | | -To build the software, run |
32 | | -```shell |
33 | | -./compile.sh |
| 64 | +# With kappa scaling for further memory reduction |
| 65 | +./deploy/stream_cpi <graph> --k=<number of blocks> --rle_length=0 --kappa=20 |
| 66 | + |
| 67 | +# Full parameter list |
| 68 | +./deploy/stream_cpi --help |
34 | 69 | ``` |
35 | 70 |
|
36 | | -Alternatively, you can use the standard CMake build process. |
| 71 | +--- |
37 | 72 |
|
38 | | -The resulting binary is located in `deploy/stream_cpi` and `deploy/stream_cpi_generated`. |
| 73 | +## Compression Modes |
39 | 74 |
|
40 | | -## Running StreamCPI |
| 75 | +The `--rle_length` flag selects the compression mode: |
41 | 76 |
|
42 | | -To partition a graph in METIS format using StreamCPI, run |
| 77 | +| rle_length | Mode | |
| 78 | +|-------------|------------------------------------------------------------------------------------------| |
| 79 | +| 0 | Complete run-length compression (recommended) | |
| 80 | +| -1 | std::vector (fastest, no compression) | |
| 81 | +| -2 | External memory PQ (using STXXL, configurable in `lib/data_structure/ExternalPQ.h`) | |
| 82 | +| 100+ | Batch-wise compression: each compression vector handles `rle_length` nodes | |
43 | 83 |
|
44 | | -```shell |
45 | | -./stream_cpi <graph filename> --k=<number of blocks> |
46 | | -``` |
47 | | -By default, the partitioner stores the resulting block assignments in a file identified by `graph_k.txt`. To obtain more information pertaining to the quality of the partition, such as, edge cut, running time, memory consumption, etc., pass the flag `--write_results`. |
| 84 | +## CPI Compression Vector |
48 | 85 |
|
49 | | -To partition a graph in METIS format using the StreamCPI with complete run length compression, run |
| 86 | +The (semi-)dynamic compression vector can be used as a drop-in replacement for `std::vector` in any streaming algorithm that stores arrays with repeating values. The standalone library is available at [kurpicz/cpi](https://github.com/kurpicz/cpi). |
50 | 87 |
|
51 | | -```shell |
52 | | -./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0, refer to table below> |
53 | | -``` |
| 88 | +## Streaming Graph Generator |
54 | 89 |
|
55 | | -The `--rle_length` flag can be set to various values depending on which mode you wish to select. Refer to the following table. |
| 90 | +The included `stream_cpi_generated` binary partitions graphs generated on-the-fly using a streaming graph generator: |
56 | 91 |
|
57 | | -| rle_length | mode | |
58 | | -|-------------|------------------------------------------------------------------------------------------| |
59 | | -| 0 | complete run length compression (recommended) | |
60 | | -| -1 | std::vector (fastest) | |
61 | | -| -2 | external memory PQ (using stxxl, easily configurable in lib/data_structure/ExternalPQ.h) | |
62 | | -| 100 or more | batch-wise compression: each compression vector is responsible for rle_length nodes | |
63 | | - |
64 | | -To further enhance memory reduction and faster runtime, pass a flag `--kappa` to encourage repeated block assignments. |
| 92 | +```bash |
| 93 | +# Barabasi-Albert graph |
| 94 | +./deploy/stream_cpi_generated <output> --k=<blocks> --rle_length=0 --kappa=20 \ |
| 95 | + --ba --nodes_to_generate=<n> --kagen_d_ba=<avg_degree> --kagen_chunk_count=<chunks> |
65 | 96 |
|
66 | | -```shell |
67 | | -./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0> --kappa=<scale factor, eg. 20> |
| 97 | +# RGG2D graph |
| 98 | +./deploy/stream_cpi_generated <output> --k=<blocks> --rle_length=0 --kappa=20 \ |
| 99 | + --rgg2d --nodes_to_generate=<n> --kagen_r=<radius> --kagen_chunk_count=<chunks> |
68 | 100 | ``` |
69 | | - |
70 | | -For a complete list of parameters alongside with descriptions, run: |
71 | 101 |
|
72 | | -```shell |
73 | | -./stream_cpi --help |
74 | | -``` |
| 102 | +See [adilchhabra/KaGen](https://github.com/adilchhabra/KaGen) for graph generation models and parameters. |
75 | 103 |
|
76 | | -Note: |
77 | | -- The program stores the results of the executed command in a [flatbuffer](https://github.com/google/flatbuffers) `.bin` |
78 | | - file identified by `graph_k_kappa.bin` if you pass the flag `write_results`. |
79 | | -- To partition graphs in StreamCPI with 64 bit vertex IDs, edit the CMakeLists.txt file to change `Line 70: option(64BITVERTEXMODE "64 bit mode" OFF)` to |
80 | | - `Line 70: option(64BITVERTEXMODE "64 bit mode" ON)`, and then run `./compile.sh`. By default, 64 bit vertex IDs are enabled. |
81 | | -- For a description of the METIS graph format, please have a look at the [KaHiP manual](https://github.com/KaHIP/KaHIP/raw/master/manual/kahip.pdf). |
| 104 | +--- |
82 | 105 |
|
83 | | -## Data References |
84 | | -In our work, we performed experiments with graphs sourced from the following repositories: |
85 | | -- SNAP Dataset: https://snap.stanford.edu/data/ |
86 | | -- 10th Dimacs Challenge: https://sites.cc.gatech.edu/dimacs10/downloads.shtml |
87 | | -- Laboratory for Web Algorithmics: https://law.di.unimi.it/datasets.php |
88 | | -- Network Repository Website: https://networkrepository.com/ |
| 106 | +## Notes |
89 | 107 |
|
90 | | -For our experiments, we converted these graphs to the METIS format, while removing parallel edges, self-loops, and directions, and assigning unitary weight to all nodes and edges. |
| 108 | +- Results are stored as [FlatBuffer](https://github.com/google/flatbuffers) `.bin` files when passing `--write_results`. |
| 109 | +- 64-bit vertex IDs are enabled by default. To disable, set `64BITVERTEXMODE` to `OFF` in `CMakeLists.txt`. |
| 110 | +- For the METIS graph format, refer to the [KaHIP manual](https://github.com/KaHIP/KaHIP/raw/master/manual/kahip.pdf). |
91 | 111 |
|
92 | | -## Additional Information |
93 | | -This repository includes another program, `deploy/stream_cpi_generated`, in which a user can partition a graph generated on-the-fly with a novel **streaming graph generator**. The streaming graph generator is also |
94 | | -made available open-source in the following GitHub repository: https://github.com/adilchhabra/KaGen, which includes instructions on how a user can experiment with various graph generation models in a streaming setting. |
95 | | -This has wide applicability across all streaming algorithms under development for experimentation and testing. Soon, this streaming generator will be integrated into the popular |
96 | | -KaGen graph generation repository: https://github.com/KarlsruheGraphGeneration/KaGen. |
| 112 | +## Data References |
97 | 113 |
|
98 | | -To partition a generated Barabassi-Albert graph using StreamCPI, run |
| 114 | +Graphs used in our experiments were sourced from: |
| 115 | +- [SNAP Dataset](https://snap.stanford.edu/data/) |
| 116 | +- [10th DIMACS Challenge](https://sites.cc.gatech.edu/dimacs10/downloads.shtml) |
| 117 | +- [Laboratory for Web Algorithmics](https://law.di.unimi.it/datasets.php) |
| 118 | +- [Network Repository](https://networkrepository.com/) |
99 | 119 |
|
100 | | -```shell |
101 | | -./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --ba --nodes_to_generate=<n> --kagen_d_ba=<avg. deg. of BA graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph> |
102 | | -``` |
| 120 | +--- |
103 | 121 |
|
104 | | -To partition a generated RGG2D graph using StreamCPI, run |
| 122 | +## Citing |
105 | 123 |
|
106 | | -```shell |
107 | | -./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --rgg2d --nodes_to_generate=<n> --kagen_r=<radius of RGG graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph> |
| 124 | +If you use StreamCPI in your research, please cite: |
| 125 | + |
| 126 | +```bibtex |
| 127 | +@inproceedings{chhabra2025streamcpi, |
| 128 | + title = {Partitioning Trillion Edge Graphs on Edge Devices}, |
| 129 | + author = {Adil Chhabra and Florian Kurpicz and Christian Schulz and Dominik Schweisgut and Daniel Seemaier}, |
| 130 | + booktitle = {SIAM Conference on Applied and Computational Discrete Algorithms (ACDA)}, |
| 131 | + year = {2025} |
| 132 | +} |
108 | 133 | ``` |
109 | 134 |
|
110 | | -Please refer to https://github.com/adilchhabra/KaGen to learn more about the graph generation models and their corresponding parameters. |
| 135 | +## Licensing |
| 136 | + |
| 137 | +StreamCPI is distributed under the MIT License. See [LICENSE](LICENSE) for details. |
0 commit comments