Skip to content

Latest commit

 

History

History
61 lines (46 loc) · 4.15 KB

File metadata and controls

61 lines (46 loc) · 4.15 KB

〽️ C++ sorting algorithms

Using two different sorting algorithms and two different paralellization strategies(fork/join and locks), this program sorts a (specified) .txt file of unique integers and outputs the sorted list to a different (specified) .txt file. The input .txt file will be an unsorted data structure [Standard Template Library (STL) vector]. This is to emulate the performance of the UNIX sort-n command.

Sorting Algorithms:

  1. Mergesort - divides into two halfs, then merges sorted halfs
  2. Bucketsort - sorts array into different bins, sorts the bins, then rejoins

🗜️ Added functionality:

  1. Fork/join parellelism - implementation of forking and joined with pthreads and barriers
  2. Locks - synchronization mechanism for enforcing limits of resource access; also known as mutex (mutual exlcusion). Used in conjunction with multiple threading a process.

Merge Sort Paralellism

  • The first idea to run merge sort on multiple threads is to fork each process. However, the issues with forking is that it copies the parent process and runs in a different memory space (while also having the same content). Imagine trying to speed up sorting, when you just end up sorting the same array multiple times. Definitely not very efficient. Instead, passing parts of the original array to different arrays to sort is the way to go. Sort parts of the entire array at the same time, and then join them. The Fix? We need to have shared memory so that all threads assigned can sort out it's respective parts of the array. Threads will take turns running on a CPU core, so the max number of threads might me the most efficient when it lines up with the CPU core count.

Bucket Sort Using Locks

  • Based on the number of threads (n) input by the user, n number of sub-buckets will be created. Within each of the sub-buckets, bucket sort will be used to sort the integers. Then, a final call to bucket sort to then rejoin all of those buckets.

🗄️ Code Organization

📁 Files

  1. main.cpp - primary C++ file for program execution
  2. Makefile - create executable objects; C++11
  3. arg_parser.h - error handling and parsing for program's input options
  4. README.pdf - write-up for project

💾 Additional file

  1. randomNumberGen.cpp- creates a text file with a specified number of random numbers from [ 1 - number size ]. Default number is 10,000 unless specified when ran (as second argument).
  2. pthread_add.h - threading if compiling on macOS (will still run as normal for linux systems)

Compiling

Note: If zipped, first unzip file before proceeding.

  1. From root directory, run make from the terminal. This will generate a program called mysort
  2. Additionally, if wanting to create a text file (source.txt) of randomly generated numbers (repeatable), run make numGen from the terminal. Then, ./numGen <amount of numbers to generate>. If a second argument isn't passed in (./numGen), the program will generate a 10,000 random numbers (between 1 and 30,000) by default.
  3. Next, follow execution syntax below for parallel algoritm sorting.
  4. Make clean will remove all file generated by make, to include make numGen.

Execution

Program option/input parameters

mysort [--name] OR mysort [source.txt] [-o out.txt] [-t NUM THREADS] [--alg=<fjmerge,lkbucket>]

  1. mysort --name: prints name to console.
  2. source.txt: unsorted txt file of numbers, with each number on a new line
  3. -o out.txt: sorted txt file of numebrs, in which the program outputs/writes to
  4. -t NUM_THREADS: specify how many threads to use during execution (including master thread)
  5. --alg=<fjmerge, lkbucket>: specify which algorithm to use.
    • fjmerge: merge sort using fork/join
    • lkbucket: bucket sort using locks
  • Additional outputs: time of execution in nanoseconds and seconds

🐜 Surviving Bugs

  • Input/output files are loaded/saved as is. Does not check for txt file types.
  • Numbers within input txt file must be integers (not checked) for the program to work.

Resources