Skip to content

Latest commit

 

History

History
120 lines (87 loc) · 2.94 KB

File metadata and controls

120 lines (87 loc) · 2.94 KB

Capstone Task: Interactive Data Exploration on Klone

🎯 GOAL: Use an interactive Slurm job and core Linux tools to explore, filter, and summarize a dataset, producing a reproducible text-based result.

Overview

0. Preparation

Before this exercise:

1. Start an Interactive Job

Request an interactive job on a compute node:

salloc --partition=ckpt --time=00:30:00 --mem=4G --cpus-per-task=1

Confirm you are no longer on the login node.

hostname

2. Set Up Your Workspace

Navigate to your tutorial working directory and create a new subdirectory for this task:

cd /gscratch/scrubbed/$USER/linux-fundamentals
mkdir capstone
cd capstone

Copy the animal dataset into this directory:

cp ../shell-lesson-data/exercise-data/animal-counts/animals.csv .
ls

3. Inspect and Understand the Dataset

Without opening the file in an editor:

  • Determine how many records are in the dataset.
  • View the first and last 3 lines of the file.
  • Identify how many unique animals appear in the dataset.

You should use a combination of:

  • wc
  • head / tail
  • cut, sort, uniq

Example (not complete):

cut -d, -f2 animals.csv | sort | uniq

What is cut -d, -f2 animals.csv doing?

4. Generate a Reproducible Summary

Create a summary file called animal_summary.txt using command output redirection (not manual typing):

The file must include:

  • Total number of records
  • Alphabetically sorted list of unique animals
  • A count of how many observations exist for each animal

Hint: uniq -c counts unique occurances of a pattern

Append all outputs into a single file using > and >>.

Verify your file:

cat animal_summary.txt

5. Filter the Dataset

Create a new CSV file that includes only rabbit observations.

Requirements:

  • File name: rabbit_counts.csv
  • Must preserve the original CSV format
  • Must be generated using grep

Confirm:

wc -l rabbit_counts.csv
cat rabbit_counts.csv

6. Demonstrate Safe Cleanup

Rename your summary file and remove the intermediate dataset copy:

mv animal_summary.txt final_summary.txt
rm animals.csv
ls

7. Exit Cleanly

Exit the interactive job:

exit

You've completed our tutorials! Stay tuned for additional content or return to the main menu here.