TackleHunger · Copilot · Oct 24, 2025 · Oct 24, 2025 · Oct 24, 2025 · Oct 24, 2025
diff --git a/.env.example b/.env.example
@@ -1,13 +1,23 @@
-# SIMPLE .env Configuration for Volunteers
-# Copy this file to .env and add your actual API token
+# Tackle Hunger API Configuration
+# Copy this file to .env and fill in the actual values from GitHub secrets
 
-# Required: Get this from your team lead  
-AI_SCRAPING_TOKEN=your_ai_scraping_token_here
-
-# Optional: Custom GraphQL API URL (defaults to dev API if not set)
+# GraphQL API Endpoints
 AI_SCRAPING_GRAPHQL_URL=https://devapi.sboc.us/graphql
 
-# Optional: Environment (defaults to "dev" if not set)
+# API Authentication
+AI_SCRAPING_TOKEN=your_ai_scraping_token_here
+
+# Environment Selection (dev|copilot|staging|production)
 ENVIRONMENT=dev
 
-# That's it! The code handles everything else automatically.
+# AI/ETL Operation Identifiers
+CREATED_METHOD=AI_Copilot_Assistant
+MODIFIED_BY=''
+
+# Rate limiting and timeout settings
+API_RATE_LIMIT=10
+API_TIMEOUT=30
+
+# Logging configuration
+LOG_LEVEL=INFO
+LOG_FORMAT=json
diff --git a/.gitignore b/.gitignore
@@ -174,3 +174,6 @@ env/
 
 # Docker
 docker-compose.override.yml
+
+# Data output files
+charity_data.json
diff --git a/README.md b/README.md
@@ -43,6 +43,23 @@ python -m pytest tests/
 2. **GitHub Codespaces** - Cloud development environment ([Guide](docs/codespaces-setup.md))  
 3. **Docker** - Containerized environment ([Guide](docs/docker-setup.md))
 
+## 📥 Data Retrieval
+
+**Retrieve charity data for analysis:**
+
+```bash
+# Fetch all charity data and save to charity_data.json
+python scripts/retrieve_charity_data.py
+```
+
+This will retrieve all charity sites (~39,000 records) and save them to a well-structured JSON file. Perfect for:
+- Analyzing data gaps and missing information
+- Planning validation campaigns
+- Data quality assessment
+- Integration with other tools
+
+**📖 See the [Data Retrieval Guide](docs/DATA_RETRIEVAL_GUIDE.md) for complete documentation and analysis examples.**
+
 ## 📊 Project Goals
 
 **Target Deliverables:**
@@ -61,6 +78,7 @@ python -m pytest tests/
 ## 🆘 Need Help?
 
 - **Getting Started**: [How to Validate Charities Guide](HOW_TO_VALIDATE_CHARITIES.md)
+- **Data Retrieval**: [Data Retrieval Guide](docs/DATA_RETRIEVAL_GUIDE.md) - Fetch and analyze charity data
 - **API Reference**: GraphQL playground at https://devapi.sboc.us/graphql  
 - **Network Issues**: [Firewall Setup Guide](docs/firewall-setup.md)
 - **Questions**: Ask in the project channel

diff --git a/VOLUNTEER_QUICK_START.md b/VOLUNTEER_QUICK_START.md
@@ -15,7 +15,7 @@ Welcome to the Tackle Hunger Charity Validation project! This guide will get you
 
 3. **Verify everything works:**
    ```bash
-   python -m pytest tests/
+   python -m pytest
    ```
 
 ## 📋 What You'll Be Working On

diff --git a/docs/DATA_RETRIEVAL_GUIDE.md b/docs/DATA_RETRIEVAL_GUIDE.md
@@ -0,0 +1,240 @@
+# Charity Data Retrieval Guide
+
+This guide explains how to retrieve charity data from the Tackle Hunger API and save it as JSON for analysis.
+
+## Quick Start
+
+### 1. Ensure Prerequisites
+
+Make sure you have:
+- Python 3.13 installed
+- Dependencies installed: `pip install -r requirements.txt`
+- API token configured in `.env` file
+
+### 2. Run the Data Retrieval Script
+
+```bash
+python scripts/retrieve_charity_data.py
+```
+
+That's it! The script will:
+- Connect to the Tackle Hunger API
+- Retrieve all charity site data
+- Save it to `charity_data.json` in the project root
+- Display a summary with the record count
+
+## Example Output
+
+```
+🎯 Tackle Hunger Charity Data Retrieval
+============================================================
+
+🔄 Connecting to Tackle Hunger API...
+✅ Connected to: https://devapi.sboc.us/graphql
+🌍 Environment: dev
+
+📥 Fetching charity sites data...
+💾 Saving data to /path/to/charity_data.json...
+✅ Data saved successfully!
+📊 File size: 18679.85 KB
+
+============================================================
+📊 DATA RETRIEVAL SUMMARY
+============================================================
+Retrieved at: 2025-10-24T15:30:25.306448+00:00
+Environment: dev
+Total records: 39018
+
+📝 Sample record structure:
+  • id: S1TSHWDZ
+  • organizationId: 0RG0BS5A
+  • name: Joliet Jewish Congregation
+  • streetAddress: 250 N Midland Ave
+  • city: Joliet
+  ... and 10 more fields
+============================================================
+
+✨ Data retrieval completed successfully!
+🔍 You can now analyze the data in charity_data.json
+```
+
+## JSON Structure
+
+The output file `charity_data.json` has a well-structured format designed for easy parsing and analysis:
+
+```json
+{
+  "metadata": {
+    "retrieved_at": "2025-10-24T15:30:25.306448+00:00",
+    "environment": "dev",
+    "endpoint": "https://devapi.sboc.us/graphql",
+    "total_records": 39018,
+    "data_type": "charity_sites"
+  },
+  "sites": [
+    {
+      "id": "S1TSHWDZ",
+      "organizationId": "0RG0BS5A",
+      "name": "Joliet Jewish Congregation",
+      "streetAddress": "250 N Midland Ave",
+      "city": "Joliet",
+      "state": "IL",
+      "zip": "60435",
+      "publicEmail": "example@example.com",
+      "publicPhone": "555-1234",
+      "website": "https://example.com",
+      "description": "Community food pantry...",
+      "serviceArea": "Joliet area",
+      "acceptsFoodDonations": "YES",
+      "status": "OPERATIONAL",
+      "ein": "12-3456789"
+    }
+    // ... 39,017 more records
+  ]
+}
+```
+
+## Analyzing the Data with Python
+
+Here are some quick examples to get you started:
+
+### Load and Explore
+
+```python
+import json
+
+# Load the data
+with open('charity_data.json', 'r') as f:
+    data = json.load(f)
+
+# Check metadata
+print(f"Retrieved: {data['metadata']['retrieved_at']}")
+print(f"Total records: {data['metadata']['total_records']}")
+
+# Access sites
+sites = data['sites']
+print(f"First site: {sites[0]['name']}")
+```
+
+### Filter by State
+
+```python
+# Find all sites in New York
+ny_sites = [site for site in data['sites'] if site.get('state') == 'NY']
+print(f"Found {len(ny_sites)} sites in New York")
+```
+
+### Find Sites Missing Information
+
+```python
+# Find sites without a website
+no_website = [site for site in data['sites'] if not site.get('website')]
+print(f"Sites without website: {len(no_website)}")
+
+# Find sites without email
+no_email = [site for site in data['sites'] if not site.get('publicEmail')]
+print(f"Sites without email: {len(no_email)}")
+```
+
+### Analyze by Status
+
+```python
+from collections import Counter
+
+# Count sites by status
+status_counts = Counter(site.get('status') for site in data['sites'])
+print("Sites by status:")
+for status, count in status_counts.items():
+    print(f"  {status}: {count}")
+```
+
+### Using Pandas for Advanced Analysis
+
+```python
+import pandas as pd
+
+# Convert to DataFrame for easier analysis
+df = pd.DataFrame(data['sites'])
+
+# Summary statistics
+print(df.describe())
+
+# Group by state
+state_counts = df.groupby('state').size().sort_values(ascending=False)
+print("\nTop 10 states by number of sites:")
+print(state_counts.head(10))
+
+# Check data completeness
+print("\nData completeness:")
+print(df.isnull().sum())
+```
+
+## Troubleshooting
+
+### Error: No .env file found
+
+Create a `.env` file in the project root:
+
+```bash
+cp .env.example .env
+# Edit .env and add your AI_SCRAPING_TOKEN
+```
+
+### Error: Authentication failed
+
+Check that your `AI_SCRAPING_TOKEN` in the `.env` file is correct. Contact your team lead if you need a token.
+
+### Error: Connection timeout
+
+1. Check your internet connection
+2. Verify firewall settings (see [Firewall Setup Guide](firewall-setup.md))
+3. The API may be temporarily unavailable - try again later
+
+### Large File Size
+
+The `charity_data.json` file is approximately 19 MB for ~39,000 records. This is normal. The file is excluded from git via `.gitignore`.
+
+## Data Fields Reference
+
+Each site record contains the following fields:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | String | Unique site identifier |
+| `organizationId` | String | Parent organization ID |
+| `name` | String | Site/charity name |
+| `streetAddress` | String | Street address |
+| `city` | String | City name |
+| `state` | String | State code (e.g., "NY") |
+| `zip` | String | ZIP/postal code |
+| `publicEmail` | String | Public contact email |
+| `publicPhone` | String | Public contact phone |
+| `website` | String | Website URL |
+| `description` | String | Site description |
+| `serviceArea` | String | Area served |
+| `acceptsFoodDonations` | String | Whether accepts donations (YES/NO/UNKNOWN) |
+| `status` | String | Operational status |
+| `ein` | String | Tax ID number |
+
+See the [README.md](README.md) for complete field documentation.
+
+## Next Steps
+
+After retrieving the data:
+
+1. **Analyze gaps**: Find sites missing critical information
+2. **Verify data**: Cross-reference with external sources
+3. **Plan updates**: Identify which sites need updates
+4. **Use the API**: Update sites using the GraphQL mutations
+
+See [How to Validate Charities Guide](HOW_TO_VALIDATE_CHARITIES.md) for the complete workflow.
+
+## Support
+
+- **Technical issues**: Check the [Firewall Setup Guide](firewall-setup.md)
+- **Questions**: Ask in the project channel
+- **API Reference**: Visit https://devapi.sboc.us/graphql
+
+---
+
+**Happy analyzing! 📊**
diff --git a/docs/firewall-setup.md b/docs/firewall-setup.md
@@ -55,6 +55,16 @@ import os
 os.environ['HTTPS_PROXY'] = 'https://your-proxy:port'
 os.environ['HTTP_PROXY'] = 'http://your-proxy:port'
 
+# SSL verification (if using internal certificates)
+import ssl
+import certifi
+import requests
+
+# For custom certificate bundle
+requests_session = requests.Session()
+requests_session.verify = '/path/to/your/certificate/bundle.pem'
+```
+
 ### Security Considerations
 
 **Rate Limiting:**

diff --git a/pytest.ini b/pytest.ini
@@ -8,8 +8,6 @@ addopts =
     --tb=short
     --strict-markers
     --disable-warnings
-    --ignore=scripts/
-    -p no:cacheprovider
 markers =
     slow: marks tests as slow (deselect with '-m "not slow"')
     integration: marks tests as integration tests

diff --git a/requirements.txt b/requirements.txt
@@ -1,15 +1,38 @@
-# SIMPLIFIED REQUIREMENTS FOR VOLUNTEERS
-# Only the essentials - no enterprise complexity
-
-# Core libraries for GraphQL API calls
+# Core HTTP and GraphQL client libraries
 requests>=2.31.0
+httpx>=0.25.0
+graphql-core>=3.2.0
 gql[requests]>=3.4.0
 
-# Environment configuration
+# Data validation and parsing
+pydantic>=2.4.0
+pydantic[email]>=2.4.0
+pydantic-settings>=2.0.0
+
+# Environment and configuration management
 python-dotenv>=1.0.0
+pyyaml>=6.0.1
+
+# Date/time handling
+python-dateutil>=2.8.2
+
+# Async support for API operations
+aiohttp>=3.8.0
+asyncio-throttle>=1.0.2
+
+# Data processing utilities
+pandas>=2.1.0
+numpy>=1.24.0
+
+# Logging and monitoring
+structlog>=23.1.0
 
-# Testing
+# Testing frameworks
 pytest>=7.4.0
+pytest-asyncio>=0.21.0
+pytest-mock>=3.11.0
 
-# Optional development tools
+# Development utilities
 black>=23.7.0
+flake8>=6.0.0
+mypy>=1.5.0