Skip to content

Commit 8824e77

Browse files
committed
Initial commit: ScraperGuard v1.0 — scraper reliability framework
0 parents  commit 8824e77

97 files changed

Lines changed: 13287 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
__pycache__
2+
*.pyc
3+
.git
4+
.github
5+
tests/
6+
*.egg-info
7+
dist/
8+
build/
9+
.pytest_cache
10+
.mypy_cache
11+
.ruff_cache

.github/workflows/ci.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: CI
2+
3+
on: [push, pull_request]
4+
5+
jobs:
6+
test:
7+
runs-on: ubuntu-latest
8+
strategy:
9+
matrix:
10+
python-version: ["3.11", "3.12"]
11+
steps:
12+
- uses: actions/checkout@v4
13+
- uses: actions/setup-python@v5
14+
with:
15+
python-version: ${{ matrix.python-version }}
16+
- run: pip install -e ".[dev]"
17+
- run: pytest --tb=short -q
18+
- run: ruff check src/
19+
- run: mypy src/scraperguard/ --ignore-missing-imports
20+
21+
lint:
22+
runs-on: ubuntu-latest
23+
steps:
24+
- uses: actions/checkout@v4
25+
- uses: actions/setup-python@v5
26+
with:
27+
python-version: "3.12"
28+
- run: pip install ruff
29+
- run: ruff check src/
30+
- run: ruff format --check src/

.gitignore

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
*.egg-info/
7+
*.egg
8+
dist/
9+
build/
10+
.eggs/
11+
12+
# Virtual environments
13+
.venv/
14+
venv/
15+
env/
16+
17+
# IDE
18+
.vscode/
19+
.idea/
20+
*.swp
21+
*.swo
22+
*~
23+
24+
# OS
25+
.DS_Store
26+
Thumbs.db
27+
28+
# Testing / Coverage
29+
.coverage
30+
htmlcov/
31+
.pytest_cache/
32+
.mypy_cache/
33+
.ruff_cache/
34+
35+
# Runtime artifacts
36+
*.db
37+
*.sqlite
38+
*.sqlite3
39+
40+
# ScraperGuard local artifacts
41+
scraperguard.db
42+
scraperguard.yaml
43+
_dump/
44+
45+
# Logs
46+
*.log
47+
48+
# Environment
49+
.env
50+
.env.local

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.13

CONTRIBUTING.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Contributing to ScraperGuard
2+
3+
Thanks for your interest in contributing to ScraperGuard!
4+
5+
## Development Setup
6+
7+
1. Clone the repository:
8+
9+
```bash
10+
git clone https://github.com/mostafam-dev/scraperguard.git
11+
cd scraperguard
12+
```
13+
14+
2. Install in development mode:
15+
16+
```bash
17+
pip install -e ".[dev]"
18+
```
19+
20+
Or with uv:
21+
22+
```bash
23+
uv sync --extra dev
24+
```
25+
26+
3. Verify the installation:
27+
28+
```bash
29+
pytest
30+
scraperguard --help
31+
```
32+
33+
## Running Tests
34+
35+
```bash
36+
# Run all tests
37+
pytest
38+
39+
# Run with coverage
40+
pytest --cov=scraperguard --cov-report=term-missing
41+
42+
# Run a specific test file
43+
pytest tests/test_health.py
44+
45+
# Run tests matching a pattern
46+
pytest -k "test_schema"
47+
```
48+
49+
## Code Style
50+
51+
We use [ruff](https://docs.astral.sh/ruff/) for linting and formatting:
52+
53+
```bash
54+
# Check for lint errors
55+
ruff check src/ tests/
56+
57+
# Auto-fix lint errors
58+
ruff check --fix src/ tests/
59+
60+
# Format code
61+
ruff format src/ tests/
62+
```
63+
64+
Configuration is in `pyproject.toml` under `[tool.ruff]`.
65+
66+
## Type Checking
67+
68+
We use [mypy](https://mypy.readthedocs.io/) for static type checking:
69+
70+
```bash
71+
mypy src/scraperguard/
72+
```
73+
74+
Configuration is in `pyproject.toml` under `[tool.mypy]`.
75+
76+
## Submitting Changes
77+
78+
1. Fork the repository and create a feature branch:
79+
80+
```bash
81+
git checkout -b feature/my-feature
82+
```
83+
84+
2. Make your changes, ensuring:
85+
- All tests pass (`pytest`)
86+
- Code is formatted (`ruff format`)
87+
- No lint errors (`ruff check`)
88+
- Types check cleanly (`mypy src/scraperguard/`)
89+
90+
3. Write tests for new functionality.
91+
92+
4. Commit with a clear message describing the change.
93+
94+
5. Open a pull request against the `main` branch.
95+
96+
## Project Structure
97+
98+
```
99+
src/scraperguard/ # Main package source
100+
tests/ # Test suite (mirrors src/ structure)
101+
examples/ # Working examples and configs
102+
docs/ # Documentation
103+
```
104+
105+
## Reporting Issues
106+
107+
Please open an issue on GitHub with:
108+
- A clear description of the problem
109+
- Steps to reproduce
110+
- Expected vs actual behavior
111+
- Python version and OS

Dockerfile

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM python:3.11-slim
2+
3+
WORKDIR /app
4+
5+
COPY pyproject.toml .
6+
COPY src/ src/
7+
8+
RUN pip install --no-cache-dir ".[api]"
9+
10+
EXPOSE 8000
11+
12+
CMD ["scraperguard", "serve", "--host", "0.0.0.0", "--port", "8000"]

Makefile

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
.PHONY: test lint format typecheck docker-build docker-run
2+
3+
test:
4+
pytest --tb=short -q
5+
6+
lint:
7+
ruff check src/
8+
9+
format:
10+
ruff format src/
11+
12+
typecheck:
13+
mypy src/scraperguard/ --ignore-missing-imports
14+
15+
docker-build:
16+
docker build -t scraperguard .
17+
18+
docker-run:
19+
docker-compose up -d

0 commit comments

Comments
 (0)