🎯 GOAL: Use an interactive Slurm job and core Linux tools to explore, filter, and summarize a dataset, producing a reproducible text-based result.
- 0. Preparation
- 1. Start an Interactive Job
- 2. Set Up Your Workspace
- 3. Inspect and Understand the Dataset
- 4. Generate a Reproducible Summary
- 5. Filter the Dataset
- 6. Demonstrate Safe Cleanup
- 7. Exit Cleanly
Before this exercise:
Request an interactive job on a compute node:
salloc --partition=ckpt --time=00:30:00 --mem=4G --cpus-per-task=1Confirm you are no longer on the login node.
hostnameNavigate to your tutorial working directory and create a new subdirectory for this task:
cd /gscratch/scrubbed/$USER/linux-fundamentals
mkdir capstone
cd capstoneCopy the animal dataset into this directory:
cp ../shell-lesson-data/exercise-data/animal-counts/animals.csv .
lsWithout opening the file in an editor:
- Determine how many records are in the dataset.
- View the first and last 3 lines of the file.
- Identify how many unique animals appear in the dataset.
You should use a combination of:
wchead / tailcut, sort, uniq
Example (not complete):
cut -d, -f2 animals.csv | sort | uniqWhat is cut -d, -f2 animals.csv doing?
Create a summary file called animal_summary.txt using command output redirection (not manual typing):
The file must include:
- Total number of records
- Alphabetically sorted list of unique animals
- A count of how many observations exist for each animal
Hint:
uniq -ccounts unique occurances of a pattern
Append all outputs into a single file using > and >>.
Verify your file:
cat animal_summary.txtCreate a new CSV file that includes only rabbit observations.
Requirements:
- File name: rabbit_counts.csv
- Must preserve the original CSV format
- Must be generated using
grep
Confirm:
wc -l rabbit_counts.csv
cat rabbit_counts.csvRename your summary file and remove the intermediate dataset copy:
mv animal_summary.txt final_summary.txt
rm animals.csv
lsExit the interactive job:
exitYou've completed our tutorials! Stay tuned for additional content or return to the main menu here.