Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
# SIMPLE .env Configuration for Volunteers
# Copy this file to .env and add your actual API token
# Tackle Hunger API Configuration
# Copy this file to .env and fill in the actual values from GitHub secrets

# Required: Get this from your team lead
AI_SCRAPING_TOKEN=your_ai_scraping_token_here

# Optional: Custom GraphQL API URL (defaults to dev API if not set)
# GraphQL API Endpoints
AI_SCRAPING_GRAPHQL_URL=https://devapi.sboc.us/graphql

# Optional: Environment (defaults to "dev" if not set)
# API Authentication
AI_SCRAPING_TOKEN=your_ai_scraping_token_here

# Environment Selection (dev|copilot|staging|production)
ENVIRONMENT=dev

# That's it! The code handles everything else automatically.
# AI/ETL Operation Identifiers
CREATED_METHOD=AI_Copilot_Assistant
MODIFIED_BY=''

# Rate limiting and timeout settings
API_RATE_LIMIT=10
API_TIMEOUT=30

# Logging configuration
LOG_LEVEL=INFO
LOG_FORMAT=json
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -174,3 +174,6 @@ env/

# Docker
docker-compose.override.yml

# Data output files
charity_data.json
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,23 @@ python -m pytest tests/
2. **GitHub Codespaces** - Cloud development environment ([Guide](docs/codespaces-setup.md))
3. **Docker** - Containerized environment ([Guide](docs/docker-setup.md))

## 📥 Data Retrieval

**Retrieve charity data for analysis:**

```bash
# Fetch all charity data and save to charity_data.json
python scripts/retrieve_charity_data.py
```

This will retrieve all charity sites (~39,000 records) and save them to a well-structured JSON file. Perfect for:
- Analyzing data gaps and missing information
- Planning validation campaigns
- Data quality assessment
- Integration with other tools

**📖 See the [Data Retrieval Guide](docs/DATA_RETRIEVAL_GUIDE.md) for complete documentation and analysis examples.**

## 📊 Project Goals

**Target Deliverables:**
Expand All @@ -61,6 +78,7 @@ python -m pytest tests/
## 🆘 Need Help?

- **Getting Started**: [How to Validate Charities Guide](HOW_TO_VALIDATE_CHARITIES.md)
- **Data Retrieval**: [Data Retrieval Guide](docs/DATA_RETRIEVAL_GUIDE.md) - Fetch and analyze charity data
- **API Reference**: GraphQL playground at https://devapi.sboc.us/graphql
- **Network Issues**: [Firewall Setup Guide](docs/firewall-setup.md)
- **Questions**: Ask in the project channel
Expand Down
2 changes: 1 addition & 1 deletion VOLUNTEER_QUICK_START.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Welcome to the Tackle Hunger Charity Validation project! This guide will get you

3. **Verify everything works:**
```bash
python -m pytest tests/
python -m pytest
```

## 📋 What You'll Be Working On
Expand Down
240 changes: 240 additions & 0 deletions docs/DATA_RETRIEVAL_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
# Charity Data Retrieval Guide

This guide explains how to retrieve charity data from the Tackle Hunger API and save it as JSON for analysis.

## Quick Start

### 1. Ensure Prerequisites

Make sure you have:
- Python 3.13 installed
- Dependencies installed: `pip install -r requirements.txt`
- API token configured in `.env` file

### 2. Run the Data Retrieval Script

```bash
python scripts/retrieve_charity_data.py
```

That's it! The script will:
- Connect to the Tackle Hunger API
- Retrieve all charity site data
- Save it to `charity_data.json` in the project root
- Display a summary with the record count

## Example Output

```
🎯 Tackle Hunger Charity Data Retrieval
============================================================

🔄 Connecting to Tackle Hunger API...
✅ Connected to: https://devapi.sboc.us/graphql
🌍 Environment: dev

📥 Fetching charity sites data...
💾 Saving data to /path/to/charity_data.json...
✅ Data saved successfully!
📊 File size: 18679.85 KB

============================================================
📊 DATA RETRIEVAL SUMMARY
============================================================
Retrieved at: 2025-10-24T15:30:25.306448+00:00
Environment: dev
Total records: 39018

📝 Sample record structure:
• id: S1TSHWDZ
• organizationId: 0RG0BS5A
• name: Joliet Jewish Congregation
• streetAddress: 250 N Midland Ave
• city: Joliet
... and 10 more fields
============================================================

✨ Data retrieval completed successfully!
🔍 You can now analyze the data in charity_data.json
```

## JSON Structure

The output file `charity_data.json` has a well-structured format designed for easy parsing and analysis:

```json
{
"metadata": {
"retrieved_at": "2025-10-24T15:30:25.306448+00:00",
"environment": "dev",
"endpoint": "https://devapi.sboc.us/graphql",
"total_records": 39018,
"data_type": "charity_sites"
},
"sites": [
{
"id": "S1TSHWDZ",
"organizationId": "0RG0BS5A",
"name": "Joliet Jewish Congregation",
"streetAddress": "250 N Midland Ave",
"city": "Joliet",
"state": "IL",
"zip": "60435",
"publicEmail": "example@example.com",
"publicPhone": "555-1234",
"website": "https://example.com",
"description": "Community food pantry...",
"serviceArea": "Joliet area",
"acceptsFoodDonations": "YES",
"status": "OPERATIONAL",
"ein": "12-3456789"
}
// ... 39,017 more records
]
}
```

## Analyzing the Data with Python

Here are some quick examples to get you started:

### Load and Explore

```python
import json

# Load the data
with open('charity_data.json', 'r') as f:
data = json.load(f)

# Check metadata
print(f"Retrieved: {data['metadata']['retrieved_at']}")
print(f"Total records: {data['metadata']['total_records']}")

# Access sites
sites = data['sites']
print(f"First site: {sites[0]['name']}")
```

### Filter by State

```python
# Find all sites in New York
ny_sites = [site for site in data['sites'] if site.get('state') == 'NY']
print(f"Found {len(ny_sites)} sites in New York")
```

### Find Sites Missing Information

```python
# Find sites without a website
no_website = [site for site in data['sites'] if not site.get('website')]
print(f"Sites without website: {len(no_website)}")

# Find sites without email
no_email = [site for site in data['sites'] if not site.get('publicEmail')]
print(f"Sites without email: {len(no_email)}")
```

### Analyze by Status

```python
from collections import Counter

# Count sites by status
status_counts = Counter(site.get('status') for site in data['sites'])
print("Sites by status:")
for status, count in status_counts.items():
print(f" {status}: {count}")
```

### Using Pandas for Advanced Analysis

```python
import pandas as pd

# Convert to DataFrame for easier analysis
df = pd.DataFrame(data['sites'])

# Summary statistics
print(df.describe())

# Group by state
state_counts = df.groupby('state').size().sort_values(ascending=False)
print("\nTop 10 states by number of sites:")
print(state_counts.head(10))

# Check data completeness
print("\nData completeness:")
print(df.isnull().sum())
```

## Troubleshooting

### Error: No .env file found

Create a `.env` file in the project root:

```bash
cp .env.example .env
# Edit .env and add your AI_SCRAPING_TOKEN
```

### Error: Authentication failed

Check that your `AI_SCRAPING_TOKEN` in the `.env` file is correct. Contact your team lead if you need a token.

### Error: Connection timeout

1. Check your internet connection
2. Verify firewall settings (see [Firewall Setup Guide](firewall-setup.md))
3. The API may be temporarily unavailable - try again later

### Large File Size

The `charity_data.json` file is approximately 19 MB for ~39,000 records. This is normal. The file is excluded from git via `.gitignore`.

## Data Fields Reference

Each site record contains the following fields:

| Field | Type | Description |
|-------|------|-------------|
| `id` | String | Unique site identifier |
| `organizationId` | String | Parent organization ID |
| `name` | String | Site/charity name |
| `streetAddress` | String | Street address |
| `city` | String | City name |
| `state` | String | State code (e.g., "NY") |
| `zip` | String | ZIP/postal code |
| `publicEmail` | String | Public contact email |
| `publicPhone` | String | Public contact phone |
| `website` | String | Website URL |
| `description` | String | Site description |
| `serviceArea` | String | Area served |
| `acceptsFoodDonations` | String | Whether accepts donations (YES/NO/UNKNOWN) |
| `status` | String | Operational status |
| `ein` | String | Tax ID number |

See the [README.md](README.md) for complete field documentation.

## Next Steps

After retrieving the data:

1. **Analyze gaps**: Find sites missing critical information
2. **Verify data**: Cross-reference with external sources
3. **Plan updates**: Identify which sites need updates
4. **Use the API**: Update sites using the GraphQL mutations

See [How to Validate Charities Guide](HOW_TO_VALIDATE_CHARITIES.md) for the complete workflow.

## Support

- **Technical issues**: Check the [Firewall Setup Guide](firewall-setup.md)
- **Questions**: Ask in the project channel
- **API Reference**: Visit https://devapi.sboc.us/graphql

---

**Happy analyzing! 📊**
10 changes: 10 additions & 0 deletions docs/firewall-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,16 @@ import os
os.environ['HTTPS_PROXY'] = 'https://your-proxy:port'
os.environ['HTTP_PROXY'] = 'http://your-proxy:port'

# SSL verification (if using internal certificates)
import ssl
import certifi
import requests

# For custom certificate bundle
requests_session = requests.Session()
requests_session.verify = '/path/to/your/certificate/bundle.pem'
```

### Security Considerations

**Rate Limiting:**
Expand Down
2 changes: 0 additions & 2 deletions pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ addopts =
--tb=short
--strict-markers
--disable-warnings
--ignore=scripts/
-p no:cacheprovider
markers =
slow: marks tests as slow (deselect with '-m "not slow"')
integration: marks tests as integration tests
Expand Down
37 changes: 30 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,38 @@
# SIMPLIFIED REQUIREMENTS FOR VOLUNTEERS
# Only the essentials - no enterprise complexity

# Core libraries for GraphQL API calls
# Core HTTP and GraphQL client libraries
requests>=2.31.0
httpx>=0.25.0
graphql-core>=3.2.0
gql[requests]>=3.4.0

# Environment configuration
# Data validation and parsing
pydantic>=2.4.0
pydantic[email]>=2.4.0
pydantic-settings>=2.0.0

# Environment and configuration management
python-dotenv>=1.0.0
pyyaml>=6.0.1

# Date/time handling
python-dateutil>=2.8.2

# Async support for API operations
aiohttp>=3.8.0
asyncio-throttle>=1.0.2

# Data processing utilities
pandas>=2.1.0
numpy>=1.24.0

# Logging and monitoring
structlog>=23.1.0

# Testing
# Testing frameworks
pytest>=7.4.0
pytest-asyncio>=0.21.0
pytest-mock>=3.11.0

# Optional development tools
# Development utilities
black>=23.7.0
flake8>=6.0.0
mypy>=1.5.0
Loading