A hands-on lab for researchers: store data in Amazon S3 and process it with Python in a web-based VS Code environment.
Region: Singapore (ap-southeast-1)
Time: ~45–60 minutes
.
├── LAB.md # This lab (instructions)
├── requirements.txt # Python dependencies (boto3, pandas)
├── process_data.py # Script: read from S3, process, write to S3
└── sample-data/
├── genomics_sample.csv
└── sensor_readings.json
- Create an S3 bucket in the Singapore region via the AWS Management Console.
- Upload sample research datasets (CSV and JSON) to S3.
- Run a Python script that reads from S3, computes statistics, filters data, converts formats, and writes results back to S3.
- AWS account with console access.
- Web-based VS Code environment with Python 3 and terminal access.
- Basic familiarity with Python and CSV/JSON.
-
Sign in to AWS
Go to https://console.aws.amazon.com and sign in. -
Open S3
In the search bar, type S3 and open S3 (or go to Services → Storage → S3). -
Create bucket
Click Create bucket. -
Bucket settings
- Bucket name: Choose a globally unique name (e.g.
research-data-yourname-2025).
S3 bucket names must be unique across all AWS accounts and regions. - Region: Select Asia Pacific (Singapore) ap-southeast-1.
- Block Public Access: Leave the default (Block all public access) unless your use case requires public access.
- Bucket Versioning: Optional; you can leave Disable for this lab.
- Default encryption: Optional; Server-side encryption (SSE-S3) is a good choice for research data.
- Bucket name: Choose a globally unique name (e.g.
-
Create
Click Create bucket at the bottom. -
Note your bucket name
You will need it for uploading data and for the Python script.
You will upload two sample datasets into your bucket: a genomics-style CSV and a sensor JSON file. Use the files provided in this lab in the sample-data/ folder:
| File | Description |
|---|---|
sample-data/genomics_sample.csv |
Sample IDs, gene names, expression values, condition (control/treatment) |
sample-data/sensor_readings.json |
Sensor IDs, timestamps, numeric values, units (temperature, pH) |
- In the S3 console, click your bucket name.
- Click Upload.
- Add files:
sample-data/genomics_sample.csv→ upload asgenomics_sample.csv(or keep path; see script config below)sample-data/sensor_readings.json→ upload assensor_readings.json
- Under Destination, leave the prefix empty so files land in the bucket root. The script expects keys
genomics_sample.csvandsensor_readings.json. If you use a prefix (e.g.raw/), setGENOMICS_KEYandSENSOR_KEYinprocess_data.pyaccordingly (e.g.raw/genomics_sample.csv). - Click Upload, then Done.
Check: In the bucket, you should see your CSV and JSON files (e.g. in the root or under raw/).
In your VS Code environment, open a terminal and install dependencies:
pip install -r requirements.txtEnsure you have AWS credentials configured (e.g. environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_SESSION_TOKEN, or an AWS CLI profile). The script will use the default credential chain (environment variables or ~/.aws/credentials).
-
Set your bucket name
Openprocess_data.pyand set the variableBUCKET_NAMEto your bucket name (e.g.research-data-yourname-2025). -
Set region
Ensure the script uses regionap-southeast-1(Singapore) for S3; the sample script does this by default. -
Run the script
python process_data.py
- Reads from your S3 bucket:
genomics_sample.csv(expression data) andsensor_readings.json(sensor readings).
- Processes data:
- Statistics: mean, median, min, max, count for genomics
expressionand for sensorvalue. - Filtering: genomics rows with
expression >= 10; sensor records withvalue >= 22. - Format conversion: filtered genomics written as CSV and JSON; filtered sensor as JSON and CSV.
- Statistics: mean, median, min, max, count for genomics
- Writes results to folder
processed/inside the same bucket:processed/genomics_stats.json,processed/genomics_filtered.csv,processed/genomics_filtered.jsonprocessed/sensor_stats.json,processed/sensor_filtered.json,processed/sensor_filtered.csv
After a successful run, refresh your bucket in the S3 console and open the output folder to see the generated files.
- In the S3 console, open your bucket.
- Open the output folder (e.g.
processed/orresults/). - Download and open the generated files (e.g. summary JSON, processed CSV/JSON) and confirm they match what the script describes (statistics, filtered records, format conversion).
- Permissions: The IAM user/role used must have
s3:GetObjectands3:PutObject(ands3:ListBucketif the script lists objects) on your bucket. - Region: Bucket and script must use the same region (e.g.
ap-southeast-1). - Paths: If you uploaded files under a prefix (e.g.
raw/), set the script’s input key names to match (e.g.raw/genomics_sample.csv).
| Step | Action |
|---|---|
| 1 | Create S3 bucket in Singapore (ap-southeast-1) via AWS Console. |
| 2 | Upload genomics_sample.csv and sensor_readings.json to the bucket. |
| 3 | Install dependencies (pip install -r requirements.txt) and configure AWS credentials. |
| 4 | Set BUCKET_NAME in process_data.py and run python process_data.py. |
| 5 | Check the output folder in S3 for processed and summary files. |
Lab designed for researchers learning AWS S3 and Python data processing.