Skip to content

Add charity data retrieval script with JSON output for analysis#69

Draft
Copilot wants to merge 5 commits into
stagingfrom
copilot/retrieve-charity-data-json
Draft

Add charity data retrieval script with JSON output for analysis#69
Copilot wants to merge 5 commits into
stagingfrom
copilot/retrieve-charity-data-json

Conversation

Copilot AI commented Oct 24, 2025

Copy link
Copy Markdown
Contributor

Overview

This PR implements a data retrieval script that fetches charity site data from the Tackle Hunger GraphQL API and saves it to a well-structured JSON file for analysis. This addresses the need for volunteers to easily extract and analyze charity data to identify gaps and plan validation campaigns.

What's New

Main Feature: Data Retrieval Script

A new script scripts/retrieve_charity_data.py provides a simple one-command interface to retrieve all charity data:

python scripts/retrieve_charity_data.py

Output:

  • Retrieves 39,018 charity site records from the API
  • Saves to charity_data.json (~19 MB)
  • Displays record count and summary during execution
  • Includes metadata (timestamp, environment, endpoint, count)

JSON Structure

The output is designed for easy analysis with Python:

{
  "metadata": {
    "retrieved_at": "2025-10-24T15:39:56.613448+00:00",
    "environment": "dev",
    "endpoint": "https://devapi.sboc.us/graphql",
    "total_records": 39018,
    "data_type": "charity_sites"
  },
  "sites": [
    {
      "id": "S1TSHWDZ",
      "organizationId": "0RG0BS5A",
      "name": "Joliet Jewish Congregation",
      "streetAddress": "250 N Midland Ave",
      "city": "Joliet",
      "state": "IL",
      "zip": "60435",
      "publicEmail": null,
      "publicPhone": "(815) 741-4600",
      "website": "http://jolietjewishcongregation.com/",
      ...
    }
    // 39,017 more records
  ]
}

Documentation

Added comprehensive documentation in docs/DATA_RETRIEVAL_GUIDE.md including:

  • Quick start instructions
  • JSON structure explanation
  • Python analysis examples (filtering, pandas integration, data quality checks)
  • Troubleshooting guide
  • Field reference table

Technical Changes

Bug Fixes

  1. Fixed Pydantic import compatibility - Migrated from deprecated pydantic.BaseSettings to pydantic-settings package to support Pydantic v2
  2. Fixed GraphQL schema fetching - Disabled schema introspection to avoid API compatibility issues
  3. Fixed API query parameters - Updated get_sites_for_ai() to work without server-side limit parameter (API doesn't support it)

Code Improvements

  • Added proper error handling with helpful troubleshooting messages
  • Implemented client-side record limiting support
  • Added comprehensive test coverage (4 new tests, all passing)
  • Updated .gitignore to exclude generated JSON file

Use Cases

This enables volunteers to:

  1. Identify data gaps: Find charities missing critical fields like website, email, or description
  2. Plan validation campaigns: Analyze by state, status, or completeness
  3. Data quality assessment: Check field completion rates and identify improvement opportunities
  4. Integration: Use the JSON with pandas, analysis tools, or custom scripts

Example analysis:

import json

with open('charity_data.json', 'r') as f:
    data = json.load(f)

# Find sites without websites
no_website = [s for s in data['sites'] if not s.get('website')]
print(f"Sites needing website: {len(no_website)}")  # ~10,838 sites

# Analyze by state
import pandas as pd
df = pd.DataFrame(data['sites'])
print(df.groupby('state').size().sort_values(ascending=False).head())

Testing & Quality

  • ✅ All 9 tests passing
  • ✅ Successfully retrieved 39,018 records from dev API
  • ✅ CodeQL security scan: No vulnerabilities found
  • ✅ Code review completed and all comments addressed

Files Changed

New:

  • scripts/retrieve_charity_data.py - Main retrieval script
  • tests/test_retrieve_charity_data.py - Test suite
  • docs/DATA_RETRIEVAL_GUIDE.md - User documentation

Modified:

  • src/tackle_hunger/graphql_client.py - Pydantic v2 compatibility
  • src/tackle_hunger/site_operations.py - API query improvements
  • requirements.txt - Added pydantic-settings
  • .gitignore - Exclude charity_data.json
  • README.md - Added data retrieval section

Data Coverage

The retrieved data covers:

  • 70 states/territories
  • Top states: CA (2,941), TX (2,326), NY (2,249), FL (1,797), OH (1,699)
  • Field completeness: 100% for core fields (name, address, city, state, zip), 96% for status, 95% for phone, 72% for website

This data is ready for immediate analysis to support the Tackle Hunger charity validation mission! 🎯

Original prompt

This section details on the original issue you should resolve

<issue_title>Tackle Hunger Data Retrieval</issue_title>
<issue_description>- retrieve the data from the scripts and store it in a local json format to be evaluated later

  • I want well structured, easy to parse JSON for analysis with Python.
  • Show a count of how many records were retrieved from the API endpoint.
  • save the data to a json blob "charity_data.json"
    </issue_description>

Comments on the Issue (you are @copilot in this section)

  • Fixes TackleHunger/LNRS-Tech-for-Good-AI-Charity-Validation#68

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Oct 24, 2025
Copilot AI and others added 4 commits October 24, 2025 15:23
Co-authored-by: oraweb <2296332+oraweb@users.noreply.github.com>
Co-authored-by: oraweb <2296332+oraweb@users.noreply.github.com>
Co-authored-by: oraweb <2296332+oraweb@users.noreply.github.com>
Co-authored-by: oraweb <2296332+oraweb@users.noreply.github.com>
Copilot AI changed the title [WIP] Retrieve and store hunger data in JSON format Add charity data retrieval script with JSON output for analysis Oct 24, 2025
Copilot AI requested a review from oraweb October 24, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tackle Hunger Data Retrieval

2 participants