Name	Name	Last commit message	Last commit date
parent directory ..
include	include
scripts	scripts
src	src
Makefile	Makefile
README.md	README.md

Partitiong Algorithms

The task in this assignment is to reproduce two of the four partitioning algorithms described in the paper titled "Data Partitioning on Chip Multiprocessors." The purpose of the paper is to outline some partitioning strategies that are used in databases to parallelize computation in chip multiprocessors (CMPs).

Partitioning Techniques

The paper focuses solely on hash-based partitioning, not taking range or tag-aware strategies into account. Hash-based partitioning distributes data evenly across different shards using the hash of the shard key. The pros of this technique are the uniform distribution of the data giving a more consistent performance, while the cons are that data locality is lost, and range-based queries will by design be less efficient, as we have to inspect all partitions.

The four hash-based techniques that are mentioned in the paper are shown in the table below:

Technique	How it Works
Independent	Each thread processes its own partition independently without coordination
Concurrent	Threads work simultaneously on partitions with synchronization mechanisms
Count-then-move	First counts elements, then moves them to final positions in two phases
Parallel Buffers	Uses separate buffer spaces for each thread to avoid conflicts

Note

We choose to implement the independent and the concurrent output.

How to Run

To compile the project: make

To compile and run the project: scripts/run.sh with additional flags (or use the ./partition file).

Usage: ./partition <algorithm> <n_tuples> <n_hash_bits> [n_threads]

algorithm: 'ind' for independent or 'con' for concurrent
n_hash_bits: number of hash bits to use
n_threads: number of threads to use
n_tuples: number of tuples to partition (optional, default 2^(24))

Tip

The n_tuples argument is optional. If not provided, it defaults to 2^(24).

Examples:

# From project root
scripts/run.sh ind 8 4
scripts/run.sh con 8 4

# From scripts folder
cd scripts
./run.sh ind 8 4
cd ..

Setting Affinity

To compile and run the project with the affinity flag: scripts/run.sh -a or scripts/run.sh --affinity

Usage: ./partition [-a|--affinity] <algorithm> <n_hash_bits> <n_threads> [n_tuples]

-a or --affinity: optional flag to enable thread affinity

Examples:

scripts/run.sh -a ind 8 4         # method 1
scripts/run.sh --affinity con 8 4 # method 2

Script: `run-server.sh`

To run the server script: .scripts/run-server.sh. It generates a /out folder with many different runs for both the independent and concurrent output.

Example:

# From the project root
scripts/run-server.sh

Important

In the run-server.sh script, outcomment the perf command to obtain extra information on the running algorithm.

Course Context

We examine how threads and shared resources impact computer system performance. Since partitioning techniques affect performance directly, we will test and compare two different approaches through targeted experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Partitiong Algorithms

Partitioning Techniques

How to Run

Setting Affinity

Script: `run-server.sh`

Course Context

FilesExpand file tree

01_partitioning

Directory actions

More options

Directory actions

More options

Latest commit

History

01_partitioning

Folders and files

parent directory

README.md

Partitiong Algorithms

Partitioning Techniques

How to Run

Setting Affinity

Script: run-server.sh

Course Context

Script: `run-server.sh`