Using two different sorting algorithms and two different paralellization strategies(fork/join and locks), this program sorts a (specified) .txt file of unique integers and outputs the sorted list to a different (specified) .txt file. The input .txt file will be an unsorted data structure [Standard Template Library (STL) vector]. This is to emulate the performance of the UNIX sort-n command.
- Mergesort - divides into two halfs, then merges sorted halfs
- Bucketsort - sorts array into different bins, sorts the bins, then rejoins
- Fork/join parellelism - implementation of forking and joined with pthreads and barriers
- Locks - synchronization mechanism for enforcing limits of resource access; also known as mutex (mutual exlcusion). Used in conjunction with multiple threading a process.
- The first idea to run merge sort on multiple threads is to fork each process. However, the issues with forking is that it copies the parent process and runs in a different memory space (while also having the same content). Imagine trying to speed up sorting, when you just end up sorting the same array multiple times. Definitely not very efficient. Instead, passing parts of the original array to different arrays to sort is the way to go. Sort parts of the entire array at the same time, and then join them. The Fix? We need to have shared memory so that all threads assigned can sort out it's respective parts of the array. Threads will take turns running on a CPU core, so the max number of threads might me the most efficient when it lines up with the CPU core count.
- Based on the number of threads (n) input by the user, n number of sub-buckets will be created. Within each of the sub-buckets, bucket sort will be used to sort the integers. Then, a final call to bucket sort to then rejoin all of those buckets.
main.cpp- primary C++ file for program executionMakefile- create executable objects; C++11arg_parser.h- error handling and parsing for program's input optionsREADME.pdf- write-up for project
randomNumberGen.cpp- creates a text file with a specified number of random numbers from [ 1 - number size ]. Default number is 10,000 unless specified when ran (as second argument).pthread_add.h- threading if compiling on macOS (will still run as normal for linux systems)
Note: If zipped, first unzip file before proceeding.
- From root directory, run
makefrom the terminal. This will generate a program called mysort - Additionally, if wanting to create a text file (source.txt) of randomly generated numbers (repeatable), run
make numGenfrom the terminal. Then,./numGen <amount of numbers to generate>. If a second argument isn't passed in (./numGen), the program will generate a 10,000 random numbers (between 1 and 30,000) by default. - Next, follow execution syntax below for parallel algoritm sorting.
Make cleanwill remove all file generated bymake, to includemake numGen.
mysort [--name]
OR
mysort [source.txt] [-o out.txt] [-t NUM THREADS] [--alg=<fjmerge,lkbucket>]
mysort --name: prints name to console.source.txt: unsorted txt file of numbers, with each number on a new line-o out.txt: sorted txt file of numebrs, in which the program outputs/writes to-t NUM_THREADS: specify how many threads to use during execution (including master thread)--alg=<fjmerge, lkbucket>: specify which algorithm to use.- fjmerge: merge sort using fork/join
- lkbucket: bucket sort using locks
- Additional outputs: time of execution in nanoseconds and seconds
- Input/output files are loaded/saved as is. Does not check for txt file types.
- Numbers within input txt file must be integers (not checked) for the program to work.