Skip to content

Latest commit

 

History

History
82 lines (55 loc) · 2.31 KB

File metadata and controls

82 lines (55 loc) · 2.31 KB

Schoolify Data Collection Guide

This guide explains how to use the data collection scripts to build a comprehensive database of Victorian schools and suburbs for the Schoolify application.

Overview

The Schoolify database contains:

  • Comprehensive Victorian school data with full school names (including campus information)
  • School rankings and academic metrics
  • Victorian suburb geocoding information

Data Collection Workflow

1. Initial Data Collection

First, run the comprehensive data collection script which executes all necessary steps in sequence:

python download_postcodes.py

This script will:

  • Download a comprehensive CSV of Australian postcodes
  • Extract Victorian suburbs (postcodes 3000-3999)
  • Create a geocode cache for faster processing

2. Run Comprehensive Scraper

Next, run the comprehensive scraper to collect school and suburb data:

python comprehensive_scraper.py

This script will:

  • Fetch school rankings from Better Education
  • Collect additional school data from Victorian government sources
  • Process Victorian suburbs data with geocoding information
  • Generate a complete geocode database file (geocode-db.js)

3. Update Database

Finally, update the existing database with the new comprehensive data:

python update_database.py

This script will:

  • Create backups of existing data files
  • Merge new school data with existing data
  • Update the geocode database with comprehensive suburb information

4. Verify Results

Check the updated database statistics:

python count_database_entries.py

Data Sources

The scripts collect data from multiple sources:

  1. Better Education - School rankings and basic information
  2. Victorian Government - Comprehensive school listings
  3. Australian Postcodes - Victorian suburbs with geocoding information

Maintenance

To keep the database up-to-date:

  1. Run these scripts periodically to refresh the data
  2. Check for changes in data source formats that might require script updates
  3. Verify the data quality after each update

Troubleshooting

  • If the postcodes download fails, you can manually download the CSV from the URL in the script
  • If school data scraping encounters errors, check if the source websites have changed their format
  • Backups of previous data are stored in the data/backups directory