Skip to content

Commit 8d5d831

Browse files
Modernize README and add Homebrew formula
Restructure README to match KaHIP org conventions: badges, feature table, quick start with Homebrew, BibTeX citation, banner placeholder. Add streamcpi.rb Homebrew formula for the KaHIP/homebrew-kahip tap.
1 parent 6315575 commit 8d5d831

File tree

2 files changed

+146
-77
lines changed

2 files changed

+146
-77
lines changed

README.md

Lines changed: 104 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,137 @@
1-
# StreamCPI: Framework for Streaming Graph Partitioning
1+
StreamCPI 1.00
22
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
3+
[![C++](https://img.shields.io/badge/C++-20-blue.svg)](https://isocpp.org/)
4+
[![CMake](https://img.shields.io/badge/CMake-3.24+-064F8C.svg)](https://cmake.org/)
5+
[![Linux](https://img.shields.io/badge/Linux-supported-success.svg)](https://github.com/KaHIP/CompressedStreamingGraphPartitioning)
6+
[![macOS](https://img.shields.io/badge/macOS-supported-success.svg)](https://github.com/KaHIP/CompressedStreamingGraphPartitioning)
7+
[![GitHub Stars](https://img.shields.io/github/stars/KaHIP/CompressedStreamingGraphPartitioning)](https://github.com/KaHIP/CompressedStreamingGraphPartitioning/stargazers)
8+
[![GitHub Issues](https://img.shields.io/github/issues/KaHIP/CompressedStreamingGraphPartitioning)](https://github.com/KaHIP/CompressedStreamingGraphPartitioning/issues)
9+
[![Last Commit](https://img.shields.io/github/last-commit/KaHIP/CompressedStreamingGraphPartitioning)](https://github.com/KaHIP/CompressedStreamingGraphPartitioning/commits)
10+
[![Homebrew](https://img.shields.io/badge/Homebrew-available-orange)](https://github.com/KaHIP/homebrew-kahip)
11+
[![arXiv](https://img.shields.io/badge/arXiv-2410.07732-b31b1b.svg)](https://arxiv.org/abs/2410.07732)
12+
[![ACDA'25](https://img.shields.io/badge/ACDA'25-published-blue)](https://arxiv.org/abs/2410.07732)
13+
[![Heidelberg University](https://img.shields.io/badge/Heidelberg-University-c1002a)](https://www.uni-heidelberg.de)
14+
=====
15+
16+
<p align="center">
17+
<img src="https://raw.githubusercontent.com/KaHIP/CompressedStreamingGraphPartitioning/master/logo/streamcpi-banner.png" alt="StreamCPI Banner" width="900"/>
18+
</p>
19+
20+
**StreamCPI** is a framework for reducing the memory consumption of streaming graph partitioners by compressing block assignments using run-length encoding. Part of the [KaHIP](https://github.com/KaHIP) organization.
21+
22+
| | |
23+
|:--|:--|
24+
| **What it solves** | Memory-efficient streaming graph partitioning for trillion-edge graphs on edge devices |
25+
| **Techniques** | Run-length compressed partition indices (CPI), modified Fennel scoring, batch-wise compression, external memory PQ |
26+
| **Interfaces** | CLI |
27+
| **Requires** | C++20, CMake 3.24+, MPI, OpenMP, Argtable |
28+
29+
## Quick Start
30+
31+
### Install via Homebrew
32+
33+
```bash
34+
brew install KaHIP/kahip/streamcpi
35+
```
336

4-
## What is **StreamCPI**?
5-
**StreamCPI** is a framework for reducing the memory consumption of streaming graph partitioners by **C**ompressing the array of block assignments
6-
(**P**artition **I**ndices) used by such partitioners. In particular, StreamCPI utilizes run-length data compression to encode runs of repeating block assignments generated by the streaming partitioner on-the-fly.
7-
In this framework, we offer a novel (semi-)dynamic compression vector that functions as a drop-in replacement to standard arrays, like std::vector in C++, used to store block assignments in streaming graph partitioners.
37+
### Or build from source
838

9-
This repository contains the code to accompany our paper: *Adil Chhabra, Florian Kurpicz, Christian Schulz, Dominik Schweisgut, Daniel Seemaier. Partitioning Trillion Edge Graphs on Edge Devices. In SIAM Conference on Applied and Computational Discrete Algorithms (ACDA), to appear, 2025.*
10-
You can find a freely accessible online technical report on arXiv: https://arxiv.org/abs/2410.07732.
39+
```bash
40+
git clone --recursive https://github.com/KaHIP/CompressedStreamingGraphPartitioning.git
41+
cd CompressedStreamingGraphPartitioning
42+
./compile.sh
43+
```
1144

12-
## Can we use the (semi-)dynamic compression vector to reduce memory consumption in our streaming algorithm?
13-
Yes, if your streaming algorithm stores arrays with repeating values, you can greatly benefit from our compression vector which supports both append and access operations, and is very easy to integrate. The code and more details on how to use the compression vector
14-
are provided in a seperate GitHub repository https://github.com/kurpicz/cpi.
45+
Alternatively, use the standard CMake build process:
1546

16-
## Which streaming partitioner does StreamCPI use?
17-
In this repository, we build StreamCPI with a modified Fennel partitioning scoring function. However, StreamCPI can be inserted as a subroutine to reduce the memory footprint in any streaming graph
18-
partitioning tool that requires to store block ids in a vector of size $\Theta(n)$.
47+
```bash
48+
mkdir build && cd build
49+
cmake .. -DCMAKE_BUILD_TYPE=Release
50+
make -j$(nproc)
51+
```
1952

20-
## Installation Notes
53+
The resulting binaries are `deploy/stream_cpi` and `deploy/stream_cpi_generated`.
2154

22-
### Requirements
55+
### Run
2356

24-
* C++-14 ready compiler (g++ version 10+)
25-
* CMake
26-
* Scons (http://www.scons.org/)
27-
* Argtable (http://argtable.sourceforge.net/)
57+
```bash
58+
# Partition a METIS graph into k blocks
59+
./deploy/stream_cpi <graph> --k=<number of blocks>
2860

29-
### Building StreamCPI
61+
# With full run-length compression (recommended)
62+
./deploy/stream_cpi <graph> --k=<number of blocks> --rle_length=0
3063

31-
To build the software, run
32-
```shell
33-
./compile.sh
64+
# With kappa scaling for further memory reduction
65+
./deploy/stream_cpi <graph> --k=<number of blocks> --rle_length=0 --kappa=20
66+
67+
# Full parameter list
68+
./deploy/stream_cpi --help
3469
```
3570

36-
Alternatively, you can use the standard CMake build process.
71+
---
3772

38-
The resulting binary is located in `deploy/stream_cpi` and `deploy/stream_cpi_generated`.
73+
## Compression Modes
3974

40-
## Running StreamCPI
75+
The `--rle_length` flag selects the compression mode:
4176

42-
To partition a graph in METIS format using StreamCPI, run
77+
| rle_length | Mode |
78+
|-------------|------------------------------------------------------------------------------------------|
79+
| 0 | Complete run-length compression (recommended) |
80+
| -1 | std::vector (fastest, no compression) |
81+
| -2 | External memory PQ (using STXXL, configurable in `lib/data_structure/ExternalPQ.h`) |
82+
| 100+ | Batch-wise compression: each compression vector handles `rle_length` nodes |
4383

44-
```shell
45-
./stream_cpi <graph filename> --k=<number of blocks>
46-
```
47-
By default, the partitioner stores the resulting block assignments in a file identified by `graph_k.txt`. To obtain more information pertaining to the quality of the partition, such as, edge cut, running time, memory consumption, etc., pass the flag `--write_results`.
84+
## CPI Compression Vector
4885

49-
To partition a graph in METIS format using the StreamCPI with complete run length compression, run
86+
The (semi-)dynamic compression vector can be used as a drop-in replacement for `std::vector` in any streaming algorithm that stores arrays with repeating values. The standalone library is available at [kurpicz/cpi](https://github.com/kurpicz/cpi).
5087

51-
```shell
52-
./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0, refer to table below>
53-
```
88+
## Streaming Graph Generator
5489

55-
The `--rle_length` flag can be set to various values depending on which mode you wish to select. Refer to the following table.
90+
The included `stream_cpi_generated` binary partitions graphs generated on-the-fly using a streaming graph generator:
5691

57-
| rle_length | mode |
58-
|-------------|------------------------------------------------------------------------------------------|
59-
| 0 | complete run length compression (recommended) |
60-
| -1 | std::vector (fastest) |
61-
| -2 | external memory PQ (using stxxl, easily configurable in lib/data_structure/ExternalPQ.h) |
62-
| 100 or more | batch-wise compression: each compression vector is responsible for rle_length nodes |
63-
64-
To further enhance memory reduction and faster runtime, pass a flag `--kappa` to encourage repeated block assignments.
92+
```bash
93+
# Barabasi-Albert graph
94+
./deploy/stream_cpi_generated <output> --k=<blocks> --rle_length=0 --kappa=20 \
95+
--ba --nodes_to_generate=<n> --kagen_d_ba=<avg_degree> --kagen_chunk_count=<chunks>
6596

66-
```shell
67-
./stream_cpi <graph filename> --k=<number of blocks> --rle_length=<mode, eg., 0> --kappa=<scale factor, eg. 20>
97+
# RGG2D graph
98+
./deploy/stream_cpi_generated <output> --k=<blocks> --rle_length=0 --kappa=20 \
99+
--rgg2d --nodes_to_generate=<n> --kagen_r=<radius> --kagen_chunk_count=<chunks>
68100
```
69-
70-
For a complete list of parameters alongside with descriptions, run:
71101

72-
```shell
73-
./stream_cpi --help
74-
```
102+
See [adilchhabra/KaGen](https://github.com/adilchhabra/KaGen) for graph generation models and parameters.
75103

76-
Note:
77-
- The program stores the results of the executed command in a [flatbuffer](https://github.com/google/flatbuffers) `.bin`
78-
file identified by `graph_k_kappa.bin` if you pass the flag `write_results`.
79-
- To partition graphs in StreamCPI with 64 bit vertex IDs, edit the CMakeLists.txt file to change `Line 70: option(64BITVERTEXMODE "64 bit mode" OFF)` to
80-
`Line 70: option(64BITVERTEXMODE "64 bit mode" ON)`, and then run `./compile.sh`. By default, 64 bit vertex IDs are enabled.
81-
- For a description of the METIS graph format, please have a look at the [KaHiP manual](https://github.com/KaHIP/KaHIP/raw/master/manual/kahip.pdf).
104+
---
82105

83-
## Data References
84-
In our work, we performed experiments with graphs sourced from the following repositories:
85-
- SNAP Dataset: https://snap.stanford.edu/data/
86-
- 10th Dimacs Challenge: https://sites.cc.gatech.edu/dimacs10/downloads.shtml
87-
- Laboratory for Web Algorithmics: https://law.di.unimi.it/datasets.php
88-
- Network Repository Website: https://networkrepository.com/
106+
## Notes
89107

90-
For our experiments, we converted these graphs to the METIS format, while removing parallel edges, self-loops, and directions, and assigning unitary weight to all nodes and edges.
108+
- Results are stored as [FlatBuffer](https://github.com/google/flatbuffers) `.bin` files when passing `--write_results`.
109+
- 64-bit vertex IDs are enabled by default. To disable, set `64BITVERTEXMODE` to `OFF` in `CMakeLists.txt`.
110+
- For the METIS graph format, refer to the [KaHIP manual](https://github.com/KaHIP/KaHIP/raw/master/manual/kahip.pdf).
91111

92-
## Additional Information
93-
This repository includes another program, `deploy/stream_cpi_generated`, in which a user can partition a graph generated on-the-fly with a novel **streaming graph generator**. The streaming graph generator is also
94-
made available open-source in the following GitHub repository: https://github.com/adilchhabra/KaGen, which includes instructions on how a user can experiment with various graph generation models in a streaming setting.
95-
This has wide applicability across all streaming algorithms under development for experimentation and testing. Soon, this streaming generator will be integrated into the popular
96-
KaGen graph generation repository: https://github.com/KarlsruheGraphGeneration/KaGen.
112+
## Data References
97113

98-
To partition a generated Barabassi-Albert graph using StreamCPI, run
114+
Graphs used in our experiments were sourced from:
115+
- [SNAP Dataset](https://snap.stanford.edu/data/)
116+
- [10th DIMACS Challenge](https://sites.cc.gatech.edu/dimacs10/downloads.shtml)
117+
- [Laboratory for Web Algorithmics](https://law.di.unimi.it/datasets.php)
118+
- [Network Repository](https://networkrepository.com/)
99119

100-
```shell
101-
./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --ba --nodes_to_generate=<n> --kagen_d_ba=<avg. deg. of BA graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>
102-
```
120+
---
103121

104-
To partition a generated RGG2D graph using StreamCPI, run
122+
## Citing
105123

106-
```shell
107-
./stream_cpi_generated <partition_output_filename> --k=<number of blocks> --rle_length=<mode> --kappa=<scaling factor> --rgg2d --nodes_to_generate=<n> --kagen_r=<radius of RGG graph generation> --kagen_chunk_count=<num. of chunks within which to generate graph>
124+
If you use StreamCPI in your research, please cite:
125+
126+
```bibtex
127+
@inproceedings{chhabra2025streamcpi,
128+
title = {Partitioning Trillion Edge Graphs on Edge Devices},
129+
author = {Adil Chhabra and Florian Kurpicz and Christian Schulz and Dominik Schweisgut and Daniel Seemaier},
130+
booktitle = {SIAM Conference on Applied and Computational Discrete Algorithms (ACDA)},
131+
year = {2025}
132+
}
108133
```
109134

110-
Please refer to https://github.com/adilchhabra/KaGen to learn more about the graph generation models and their corresponding parameters.
135+
## Licensing
136+
137+
StreamCPI is distributed under the MIT License. See [LICENSE](LICENSE) for details.

streamcpi.rb

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
class Streamcpi < Formula
2+
desc "StreamCPI - Memory-Efficient Streaming Graph Partitioning via Compression"
3+
homepage "https://github.com/KaHIP/CompressedStreamingGraphPartitioning"
4+
license "MIT"
5+
head "https://github.com/KaHIP/CompressedStreamingGraphPartitioning.git", branch: "master"
6+
7+
depends_on "cmake" => :build
8+
depends_on "gcc" => :build
9+
depends_on "open-mpi"
10+
11+
def install
12+
gcc = Formula["gcc"]
13+
gcc_version = gcc.version.major
14+
15+
cmake_args = std_cmake_args.reject { |a| a.start_with?("-DCMAKE_PROJECT_TOP_LEVEL_INCLUDES=") }
16+
17+
system "cmake", "-B", "build",
18+
"-DCMAKE_BUILD_TYPE=Release",
19+
"-DCMAKE_C_COMPILER=#{gcc.opt_bin}/gcc-#{gcc_version}",
20+
"-DCMAKE_CXX_COMPILER=#{gcc.opt_bin}/g++-#{gcc_version}",
21+
"-DCMAKE_C_FLAGS=-w",
22+
"-DCMAKE_CXX_FLAGS=-w",
23+
"-DNONATIVEOPTIMIZATIONS=ON",
24+
*cmake_args
25+
system "cmake", "--build", "build", "-j#{ENV.make_jobs}"
26+
27+
bin.install "build/stream_cpi"
28+
bin.install "build/stream_cpi_generated"
29+
end
30+
31+
test do
32+
(testpath/"test.graph").write <<~EOS
33+
4 5
34+
2 3
35+
1 3 4
36+
1 2 4
37+
2 3
38+
EOS
39+
output = shell_output("#{bin}/stream_cpi #{testpath}/test.graph --k=2 2>&1")
40+
assert_match(/cut|partition|block/, output)
41+
end
42+
end

0 commit comments

Comments
 (0)