Skip to content

Commit c1218f2

Browse files
authored
feat(git-integration): implement tests [CM-736] (#3501)
1 parent 8443c16 commit c1218f2

15 files changed

Lines changed: 2334 additions & 281 deletions

.github/workflows/backend-lint.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ jobs:
9696

9797
- name: Install dependencies
9898
if: steps.changes.outputs.python_changed == 'true'
99-
run: uv sync --extra dev --frozen
99+
run: uv sync --group dev --frozen
100100

101101
- name: Check Python linting and formatting
102102
if: steps.changes.outputs.python_changed == 'true'

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,4 +197,8 @@ services/libs/tinybird/.diff_tmp
197197
.cursor/rules/*.local.mdc
198198

199199
# claude code rules
200-
CLAUDE.md
200+
CLAUDE.md
201+
202+
# git integration test repositories & output
203+
services/apps/git_integration/src/test/repos/
204+
services/apps/git_integration/src/test/outputs/custom/

services/apps/git_integration/Makefile

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ check_setup:
1818
fi
1919
@if ! uv run ruff --version >/dev/null 2>&1; then \
2020
echo "📦 Installing dev dependencies..."; \
21-
uv sync --extra dev; \
21+
uv sync --group dev; \
2222
fi
2323

2424
##@ 🚀 Setup and installation
@@ -28,7 +28,7 @@ setup: ## Install uv and dev dependencies (only if not already installed)
2828
@if command -v uv >/dev/null 2>&1; then \
2929
echo "✅ uv is already installed at $$(which uv)"; \
3030
echo "📦 Installing dev dependencies..."; \
31-
uv sync --extra dev; \
31+
uv sync --group dev; \
3232
echo "✅ Setup complete"; \
3333
else \
3434
echo "📦 uv is required to:"; \
@@ -45,7 +45,7 @@ setup: ## Install uv and dev dependencies (only if not already installed)
4545
echo 'export PATH="$$HOME/.cargo/bin:$$PATH"' >> ~/.bashrc; \
4646
export PATH="$$HOME/.cargo/bin:$$PATH"; \
4747
echo "📦 Installing dev dependencies..."; \
48-
uv sync --extra dev; \
48+
uv sync --group dev; \
4949
echo "✅ uv installed and dev dependencies ready"; \
5050
echo "📝 Next steps:"; \
5151
echo " - Restart your terminal or run: source ~/.zshrc"; \
@@ -107,4 +107,19 @@ update_deps: check_setup ## Update all dependencies
107107
lock_deps: check_setup ## Generate/update uv.lock file
108108
@echo "Generating uv.lock file..."
109109
@uv lock
110-
@echo "✅ uv.lock updated"
110+
@echo "✅ uv.lock updated"
111+
112+
##@ 🧪 Testing
113+
114+
test: check_setup ## Run tests (test=file repo=name expected=file)
115+
@echo "🧪 Running tests..."
116+
@TEST_FILE="$(if $(test),src/test/$(test),src/test/)"; \
117+
if [ -n "$(repo)" ]; then \
118+
export TEST_REPO_NAME="$(repo)"; \
119+
echo "📁 Using repository: $(repo)"; \
120+
fi; \
121+
if [ -n "$(expected)" ]; then \
122+
export TEST_EXPECTED_FILE="$(expected)"; \
123+
echo "📄 Using expected output: $(expected)"; \
124+
fi; \
125+
uv run pytest $$TEST_FILE -v

services/apps/git_integration/pyproject.toml

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,6 @@ dependencies = [
3636
"slugify>=0.0.1",
3737
]
3838

39-
[project.optional-dependencies]
40-
dev = [
41-
"jedi>=0.18.1",
42-
"pylint>=2.13.9",
43-
"pytest>=7.0.0",
44-
"yapf>=0.32.0",
45-
"ruff>=0.3.0",
46-
]
4739

4840
[project.scripts]
4941
crowd-git-ingest = "crowdgit.ingest:main"
@@ -85,3 +77,13 @@ quote-style = "double"
8577
indent-style = "space"
8678
skip-magic-trailing-comma = false
8779
line-ending = "auto"
80+
81+
[dependency-groups]
82+
dev = [
83+
"jedi>=0.18.1",
84+
"pylint>=2.13.9",
85+
"pytest>=7.0.0",
86+
"pytest-asyncio>=1.2.0",
87+
"yapf>=0.32.0",
88+
"ruff>=0.3.0",
89+
]
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Git Integration Tests
2+
3+
## Overview
4+
5+
Tests commit and activity extraction from real git repositories using actual git commands.
6+
7+
## Structure
8+
9+
```
10+
test/
11+
├── conftest.py # Pytest configuration (env vars)
12+
├── fixtures/
13+
│ ├── test_repo_seed.json # Defines test commits
14+
│ ├── build_test_repo.py # Builds git repo from seed
15+
│ ├── test-repo/ # Test git repository
16+
│ ├── expected_activities.json # Expected output baseline
17+
│ └── actual_output.json # Current test output
18+
└── test_activity_extraction.py # Test suite
19+
```
20+
21+
## Running Tests
22+
23+
```bash
24+
make test # Run all tests
25+
make test test=test_activity_extraction.py # Specific test
26+
make test repo=insights # Test different repo (from repos/)
27+
make test expected=insights_expected.json # Custom baseline (from fixtures/)
28+
make test repo=insights expected=insights_expected.json # Combined
29+
```
30+
31+
**Note:** `repo` and `expected` arguments are relative to `repos/` and `fixtures/` directories respectively.
32+
33+
## How It Works
34+
35+
1. **Test repository** created from `test_repo_seed.json` with various commit scenarios
36+
2. **CommitService** processes commits using real git commands
37+
3. **Activities captured** via mocked database (no actual DB writes)
38+
4. **Output compared** with expected baseline using deep equality check
39+
40+
## Test Coverage
41+
42+
- All activity types (authored, committed, signed-off, reviewed, tested, co-authored)
43+
- File statistics (insertions/deletions from git numstat)
44+
- Edge cases (malformed emails, unusual formats)
45+
- Complete structure validation (all fields compared)
46+
47+
## Updating Baseline
48+
49+
1. Run tests: `make test`
50+
2. Review `outputs/test-repo_actual.json`
51+
3. If correct: `cp outputs/test-repo_actual.json outputs/test-repo_expected.json`
52+
4. Commit the expected file to git
53+
5. Re-run to validate
54+
55+
## Adding Test Cases
56+
57+
Edit `fixtures/test_repo_seed.json`, rebuild repo, update baseline:
58+
59+
```bash
60+
cd src/test
61+
rm -rf repos/test-repo
62+
python3 fixtures/build_test_repo.py
63+
cd ../..
64+
make test
65+
cp src/test/outputs/test-repo_actual.json src/test/outputs/test-repo_expected.json
66+
git add src/test/outputs/test-repo_expected.json
67+
```
68+
69+
## Testing External Repositories
70+
71+
To test with a real repository:
72+
73+
```bash
74+
# Clone repo into repos/ directory
75+
cd src/test/repos
76+
git clone https://github.com/yourorg/yourrepo.git insights
77+
78+
# Run test to generate output (first run will skip validation)
79+
cd ../../..
80+
make test repo=insights
81+
# Output saved to: outputs/custom/insights_actual.json
82+
83+
# Review output and create baseline
84+
cp src/test/outputs/custom/insights_actual.json src/test/outputs/custom/insights_expected.json
85+
86+
# Future runs automatically validate against the baseline
87+
make test repo=insights
88+
```
89+
90+
**Note:** Custom repo baselines are in `outputs/custom/` which is gitignored. They're for local validation only.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
"""
2+
Pytest configuration and fixtures for git integration tests.
3+
4+
This file sets up the test environment before any test modules are imported.
5+
"""
6+
7+
import os
8+
9+
10+
def pytest_configure(config):
11+
"""
12+
Pytest hook called before test collection.
13+
Sets test environment variables before any modules are imported.
14+
"""
15+
test_env = {
16+
"CROWD_DB_WRITE_HOST": "localhost",
17+
"CROWD_DB_PORT": "9999",
18+
"CROWD_DB_USERNAME": "test_user",
19+
"CROWD_DB_PASSWORD": "test_pass",
20+
"CROWD_DB_DATABASE": "test_db",
21+
"CROWD_KAFKA_BROKERS": "localhost",
22+
"CROWD_KAFKA_TOPIC": "test-activities",
23+
"MAX_CONCURRENT_ONBOARDINGS": "3",
24+
"WORKER_POLLING_INTERVAL_SEC": "5",
25+
"WORKER_ERROR_BACKOFF_SEC": "10",
26+
"REPOSITORY_UPDATE_INTERVAL_HOURS": "24",
27+
"MAINTAINER_RETRY_INTERVAL_DAYS": "30",
28+
"MAINTAINER_UPDATE_INTERVAL_HOURS": "24",
29+
"WORKER_SHUTDOWN_TIMEOUT_SEC": "3600",
30+
}
31+
32+
# Set environment variables (only if not already set)
33+
for key, value in test_env.items():
34+
os.environ.setdefault(key, value)
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Build test git repository from seed file.
4+
5+
This script creates a git repository with commits defined in test_repo_seed.json.
6+
The repository is used for testing commit and activity extraction.
7+
"""
8+
9+
import json
10+
import os
11+
import subprocess
12+
import sys
13+
from pathlib import Path
14+
15+
16+
def run_git_command(repo_path: str, command: list[str]) -> str:
17+
"""Run a git command in the repository."""
18+
result = subprocess.run(command, cwd=repo_path, capture_output=True, text=True, check=True)
19+
return result.stdout.strip()
20+
21+
22+
def initialize_repo(repo_path: str) -> None:
23+
"""Initialize a new git repository."""
24+
if os.path.exists(repo_path):
25+
print(f"Repository already exists at {repo_path}")
26+
return
27+
28+
os.makedirs(repo_path, exist_ok=True)
29+
run_git_command(repo_path, ["git", "init"])
30+
run_git_command(repo_path, ["git", "config", "user.name", "Test User"])
31+
run_git_command(repo_path, ["git", "config", "user.email", "test@example.com"])
32+
print(f"✅ Initialized git repository at {repo_path}")
33+
34+
35+
def create_commit(repo_path: str, commit_data: dict) -> str:
36+
"""Create a single commit from commit data."""
37+
author_name = commit_data["author"]["name"]
38+
author_email = commit_data["author"]["email"]
39+
message = commit_data["message"]
40+
41+
# Get committer info (defaults to author if not specified)
42+
committer = commit_data.get("committer", commit_data["author"])
43+
committer_name = committer["name"]
44+
committer_email = committer["email"]
45+
46+
# Create/modify files
47+
for file_data in commit_data["files"]:
48+
file_path = os.path.join(repo_path, file_data["path"])
49+
os.makedirs(os.path.dirname(file_path), exist_ok=True)
50+
51+
with open(file_path, "w") as f:
52+
f.write(file_data["content"])
53+
54+
# Stage the file
55+
run_git_command(repo_path, ["git", "add", file_data["path"]])
56+
57+
# Set author and committer using minimal environment
58+
env = {
59+
"GIT_AUTHOR_NAME": author_name,
60+
"GIT_AUTHOR_EMAIL": author_email,
61+
"GIT_COMMITTER_NAME": committer_name,
62+
"GIT_COMMITTER_EMAIL": committer_email,
63+
"PATH": os.environ.get("PATH", "/usr/bin:/bin"), # Minimal PATH for git command
64+
}
65+
66+
subprocess.run(
67+
["git", "commit", "-m", message],
68+
cwd=repo_path,
69+
capture_output=True,
70+
text=True,
71+
check=True,
72+
env=env,
73+
)
74+
75+
# Get the commit hash
76+
commit_hash = run_git_command(repo_path, ["git", "rev-parse", "HEAD"])
77+
78+
return commit_hash
79+
80+
81+
def build_repository(seed_file: str, repo_path: str) -> dict:
82+
"""Build git repository from seed file."""
83+
print(f"📖 Reading seed file: {seed_file}")
84+
85+
with open(seed_file, "r") as f:
86+
seed_data = json.load(f)
87+
88+
print(f"🏗️ Building repository at: {repo_path}")
89+
90+
# Initialize repository
91+
initialize_repo(repo_path)
92+
93+
# Create commits
94+
commit_hashes = []
95+
for i, commit_data in enumerate(seed_data["commits"], 1):
96+
commit_hash = create_commit(repo_path, commit_data)
97+
commit_hashes.append(commit_hash)
98+
print(f"✅ Created commit {i}/{len(seed_data['commits'])}: {commit_hash[:8]}")
99+
100+
# Get repository statistics
101+
total_commits = run_git_command(repo_path, ["git", "rev-list", "--count", "HEAD"])
102+
103+
print("\n🎉 Repository built successfully!")
104+
print(f" Total commits: {total_commits}")
105+
print(f" Location: {repo_path}")
106+
107+
return {
108+
"repo_path": repo_path,
109+
"commit_hashes": commit_hashes,
110+
"total_commits": int(total_commits),
111+
}
112+
113+
114+
def main():
115+
"""Main entry point."""
116+
# Get script directory
117+
script_dir = Path(__file__).parent
118+
119+
# Default paths
120+
seed_file = script_dir / "test_repo_seed.json"
121+
repos_dir = script_dir.parent / "repos"
122+
repos_dir.mkdir(exist_ok=True)
123+
repo_path = repos_dir / "test-repo"
124+
125+
# Allow overriding from command line
126+
if len(sys.argv) > 1:
127+
seed_file = Path(sys.argv[1])
128+
if len(sys.argv) > 2:
129+
repo_path = Path(sys.argv[2])
130+
131+
# Build repository
132+
result = build_repository(str(seed_file), str(repo_path))
133+
134+
# Print result
135+
print("\n📊 Build Summary:")
136+
print(json.dumps(result, indent=2))
137+
138+
139+
if __name__ == "__main__":
140+
main()

0 commit comments

Comments
 (0)