Skip to content

Examples

Aaron Riekenberg edited this page Feb 21, 2026 · 52 revisions
  1. Find largest directories
  2. grep in large file
  3. Rename files in directory
  4. Organize files into subdirectories
  5. Computation on list of files
  6. Running diff on all files in 2 directories
  7. Processing CSV inputs using regular expression
  8. Compress files from find command

Find largest directories

Find subdirectories at max depth 1 and display from smallest to largest disk usage with du:

$ find .  -maxdepth 1 -type d | rust-parallel du -sh  | sort -h

grep in large file

Suppose we have a large text file, we want to run egrep to find the count of lines matching a regular expression.

This command finds the count of 3 digit words in a text file:

$ cat /usr/share/dict/words | egrep -c '^...$'
1427

Above will run a egrep single process with for the entire file, this might be too slow. To run multiple egrep processes in parallel:

$ cat /usr/share/dict/words | rust-parallel --pipe egrep -c '^...$'
255
579
593

Above breaks the file into blocks using newline as the delimiter and sends each block to parallel egrep processes. For large files this can have a big speed improvement.

Use awk to sum the outputs from each egrep:

$ cat /usr/share/dict/words | rust-parallel --pipe egrep -c '^...$' | awk '{ sum += $1 } END { print sum }'
1427

The default block size for pipe mode is 1MiB, this can be overriden with --block-size option:

$ cat /usr/share/dict/words | rust-parallel --block-size 200KiB --pipe egrep -c '^...$'
133
76
88
151
103
219
129
58
116
73
93
117
71
$ cat /usr/share/dict/words | rust-parallel --block-size 200KiB --pipe egrep -c '^...$' | awk '{ sum += $1 } END { print sum }'
1427

Rename files in directory

Rename files in current directory from from *.txt to *.csv.

{} variable is entire *.txt file name, {1} capture group is prefix of file name before .txt:

$ rust-parallel -r '(.*)\.(.*)' mv {} {1}.csv ::: *.txt

Use --dry-run to just log commands that would be executed:

$ rust-parallel --dry-run -r '(.*)\.(.*)' mv {} {1}.csv ::: *.txt

Organize files into subdirectories

Suppose we have a directory of files beginning with YYYYMM. The following will create YYYY/MM subdirectories, then move YYYYMM* files into the subdirectories. Here {1} and {2} are automatic variables for all possible year and month combinations:

rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: 2023 2024 ::: 01 02 03 04 05 06 07 08 09 10 11 12

Equivalent command using seq to generate sequences of years and months:

rust-parallel --shell 'mkdir -p {1}/{2} && mv -f {1}{2}* {1}/{2}' ::: $(seq 2023 2024) ::: $(seq -w 12)

Computation on list of files

Suppose we have a bash function analyze_file and a list of *.txt files to analyze in current directory. This example uses --jobs 4 to control max parallel jobs, --shell to call a bash function, --progress-bar to display a graphical progress bar, and --timeout-seconds to kill each job if not finished after 5 minutes.

$ analyze_file() {
  echo "in analyze_file file = $1"
  # do some expensive analysis of file $1 parameter
}

$ export -f analyze_file

$ rust-parallel --jobs 4 --shell --progress-bar --timeout-seconds $((5*60)) analyze_file ::: *.txt

Running diff on all files in 2 directories

Suppose we have two directories dir1 and dir2 within the current directory.

This command finds all files within dir1 and diffs with the same path in dir2.

Since -s shell mode is used, the entire quoted string is run with bash as a single command. The -r regular expression extracts everything after the first / character into the {1} capture group:

$ find dir1 -type f | rust-parallel -s -r '^[^/]*\/(.*)$' 'echo diffing {1} ; diff --color=always dir1/{1} dir2/{1} ; echo diff {1} returned $? '

Processing CSV inputs using regular expression

Suppose we have an input CSV file of http method, URL, and identifier. Below could be used to make parallel calls with curl including a json body. Here {method}, {url}, and {id} are named regular expression capture groups, -j3 is maximum 3 parallel jobs, -t5 is a 5 second timeout:

$ cat >./test.csv <<EOL
GET,http://example.com/endpoint1,1234
PUT,http://example.com/endpoint2,2345
POST,http://example.com/endpoint3,3456
EOL

$ cat test.csv | rust-parallel --regex '(?P<method>.*),(?P<url>.*),(?P<id>.*)' -j3 -t5 curl -X {method} {url} -d '{"identifier":{id},"operation":"{method}"}'

Compress files from find command

Use find to find all files in current directory and subdirectories. The -0 option works nicely with find -print0 to handle filenames that may have whitespace characters. Call gzip -f -k on each file from find command:

$ find . -type f -print0 | rust-parallel -0 gzip -f -k

Clone this wiki locally