Skip to content

Commit 08903e3

Browse files
authored
Refactor: Data directory structure, R2 backup, and fixed branch strategy in workflows (#29)
## Summary This PR modernizes the SFS workflow architecture with three major improvements: 1. Cleaner `data/` directory structure for all committed files 2. Cloudflare R2 as permanent backup for JSON files 3. Fixed branch strategy to eliminate branch proliferation ## Changes ### 📁 New Directory Structure - **`data/sfs_json/`** - Source JSON files (committed to git + backed up to R2) - **`data/md-markers/`** - Markdown with selex tags (committed to git) - **`output/`** - Generated files (ignored in git, HTML goes to R2) ### ☁️ Cloudflare R2 Backup - All JSON files synced to R2 for permanent storage - Provides redundancy and protection against git issues - html-export uses smart fallback: git first, R2 if unavailable ### 🔄 Fixed Branch Strategy (NEW) - **Eliminates branch proliferation**: Single branch `workflow-artifact-data` instead of timestamped branches - **Automatic PR lifecycle**: Old PR closed, new PR created on each update - **Workflow coordination**: Both fetch-sfs and upcoming-changes use same branch - **Temporal branches unchanged**: Date-range branches for historical commits remain separate ### ⚙️ Configurable Git Commits - **`enable_git_commit`** (default: true) - Enable/disable git commits - **`branch_name`** (default: `workflow-artifact-data`) - Configurable branch name ## Benefits ✅ **Better organization** - All source data in `data/`, generated in `output/` ✅ **Permanent backup** - JSON files in both git and R2 ✅ **No branch cleanup needed** - Single fixed branch, reused on each run ✅ **Simplified coordination** - All workflows use same branch ✅ **Clear naming** - `md-markers` shows selex tags included ## Before/After | Aspect | Before | After | |--------|--------|-------| | Branch strategy | New timestamped branch each run | One fixed branch, reused | | Branch cleanup | Manual (never done) | Automatic (single branch) | | Workflow coordination | Separate branches | Shared fixed branch | | PR management | Many PRs accumulate | One PR per update cycle | ## Testing - ✅ YAML syntax validated - Ready for manual workflow trigger to test fixed branch behavior - Temporal workflows remain independent and unaffected
1 parent 25bf65b commit 08903e3

4 files changed

Lines changed: 147 additions & 28 deletions

File tree

.github/workflows/fetch-sfs-workflow.yml

Lines changed: 84 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,16 @@ on:
1111
required: false
1212
default: '30'
1313
type: string
14+
enable_git_commit:
15+
description: 'Aktivera git commits av markdown-filer'
16+
required: false
17+
default: true
18+
type: boolean
19+
branch_name:
20+
description: 'Branch namn att använda (krävs om enable_git_commit är true)'
21+
required: false
22+
default: 'workflow-artifact-data'
23+
type: string
1424

1525
permissions:
1626
contents: write
@@ -38,41 +48,75 @@ jobs:
3848
3949
- name: Fetch JSON from beta.rkrattsbaser.gov.se
4050
run: |
41-
python downloaders/fetch_new_sfs_docs.py --days ${{ inputs.days || '1' }} --output sfs_json
51+
python downloaders/fetch_new_sfs_docs.py --days ${{ inputs.days || '1' }} --output data/sfs_json
4252
env:
4353
PYTHONPATH: ${{ github.workspace }}
4454

4555
- name: Process JSON files to Markdown files with Selex tags
4656
run: |
47-
python sfs_processor.py --input sfs_json --output output/md --formats md-markers
57+
python sfs_processor.py --input data/sfs_json --output data/md-markers --formats md-markers
4858
env:
4959
PYTHONPATH: ${{ github.workspace }}
5060

61+
- name: Upload JSON source files to Cloudflare R2 for permanent storage
62+
run: |
63+
aws configure set aws_access_key_id ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
64+
aws configure set aws_secret_access_key ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
65+
aws configure set region us-east-1
66+
aws configure set output json
67+
68+
# Upload all JSON files to R2 for backup
69+
aws s3 sync data/sfs_json/ s3://${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }}/sfs_json/ \
70+
--endpoint-url https://${{ secrets.CLOUDFLARE_R2_ACCOUNT_ID }}.r2.cloudflarestorage.com \
71+
--content-type "application/json" \
72+
--exclude "*.md" \
73+
--include "*.json"
74+
env:
75+
AWS_DEFAULT_REGION: us-east-1
76+
5177
- name: Configure Git
78+
if: inputs.enable_git_commit != 'false'
5279
run: |
5380
git config --local user.email "action@github.com"
5481
git config --local user.name "GitHub Action"
5582
5683
- name: Commit and push changes
84+
if: inputs.enable_git_commit != 'false'
5785
id: commit_changes
5886
run: |
5987
# Get current branch
6088
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
61-
89+
6290
# Create a new branch for commits if we have changes
63-
git add output/md/ sfs_json/
91+
git add data/md-markers/ data/sfs_json/
6492
if git diff --staged --quiet; then
6593
echo "Inga nya filer att committa"
6694
echo "has_changes=false" >> $GITHUB_OUTPUT
6795
else
68-
# Create a unique branch name for the commits
69-
TIMESTAMP=$(date +'%Y%m%d_%H%M%S')
70-
COMMIT_BRANCH="sfs_updates_${TIMESTAMP}"
71-
72-
# Create and switch to the new branch
73-
git checkout -b "$COMMIT_BRANCH"
74-
75-
git commit -m "Automatisk uppdatering av SFS-författningar $(date +'%Y-%m-%d')"
96+
# Use fixed branch name from input (default: workflow-artifact-data)
97+
COMMIT_BRANCH="${{ inputs.branch_name }}"
98+
99+
# Check if branch exists remotely
100+
if git ls-remote --heads origin "$COMMIT_BRANCH" | grep -q "$COMMIT_BRANCH"; then
101+
echo "📥 Branch '$COMMIT_BRANCH' exists, checking out and merging with main..."
102+
# Branch exists - fetch and checkout
103+
git fetch origin "$COMMIT_BRANCH"
104+
git checkout "$COMMIT_BRANCH"
105+
# Merge latest main into fixed branch to keep it updated
106+
git merge origin/main --no-edit --strategy-option theirs || echo "Merge completed (conflicts auto-resolved)"
107+
else
108+
echo "🆕 Branch '$COMMIT_BRANCH' doesn't exist, creating new..."
109+
# Branch doesn't exist - create new
110+
git checkout -b "$COMMIT_BRANCH"
111+
fi
112+
113+
# Commit with multi-line message
114+
git commit -m "Automatisk uppdatering av SFS-författningar $(date +'%Y-%m-%d')" \
115+
-m "" \
116+
-m "Inkluderar:" \
117+
-m "- Käll-JSON (data/sfs_json/)" \
118+
-m "- Markdown med selex-taggar (data/md-markers/)" \
119+
-m "- Backup till Cloudflare R2"
76120
echo "has_changes=true" >> $GITHUB_OUTPUT
77121
echo "commit_branch=${COMMIT_BRANCH}" >> $GITHUB_OUTPUT
78122
@@ -105,24 +149,47 @@ jobs:
105149
echo "::error::Git push misslyckades: ${PUSH_ERROR}"
106150
exit 1
107151
fi
108-
152+
153+
# Manage PR lifecycle: close old, create new
154+
echo "🔄 Hanterar PR för branch '$COMMIT_BRANCH'..."
155+
156+
# Close existing PR for this branch if it exists
157+
EXISTING_PR=$(gh pr list --head "$COMMIT_BRANCH" --json number --jq '.[0].number' 2>/dev/null || echo "")
158+
if [ -n "$EXISTING_PR" ]; then
159+
echo "Stänger befintlig PR #$EXISTING_PR"
160+
gh pr close "$EXISTING_PR" --comment "🔄 Stänger för att öppna ny PR med uppdaterad data från $(date +'%Y-%m-%d %H:%M')"
161+
fi
162+
163+
# Create new PR
164+
echo "📝 Skapar ny PR..."
165+
gh pr create --base main --head "$COMMIT_BRANCH" \
166+
--title "SFS Data Update - $(date +'%Y-%m-%d %H:%M')" \
167+
--body "Automatisk uppdatering av SFS-data. Inkluderar Käll-JSON, Markdown med selex-taggar och backup till R2. Workflow: fetch-sfs-workflow, Branch: $COMMIT_BRANCH, Run: ${{ github.run_id }}" \
168+
|| echo "⚠️ PR may already exist"
169+
109170
# Switch back to original branch
110171
git checkout "$CURRENT_BRANCH"
111-
112-
echo "Commits gjorda i branch '${COMMIT_BRANCH}' istället för '${CURRENT_BRANCH}'"
172+
173+
echo "Commits gjorda i branch '${COMMIT_BRANCH}'"
113174
fi
114175
115176
- name: Trigger HTML export workflow
116-
if: steps.commit_changes.outputs.has_changes == 'true' && steps.commit_changes.outputs.push_success == 'true'
177+
if: (inputs.enable_git_commit != 'false' && steps.commit_changes.outputs.has_changes == 'true' && steps.commit_changes.outputs.push_success == 'true') || (inputs.enable_git_commit == 'false')
117178
uses: actions/github-script@v6
118179
with:
119180
github-token: ${{ secrets.GITHUB_TOKEN }}
120181
script: |
182+
// Always use the fixed branch name
183+
const sourceRef = '${{ inputs.branch_name }}';
184+
121185
github.rest.actions.createWorkflowDispatch({
122186
owner: context.repo.owner,
123187
repo: context.repo.repo,
124188
workflow_id: 'html-export-workflow.yml',
125-
ref: context.ref
189+
ref: sourceRef || context.ref,
190+
inputs: {
191+
source_ref: sourceRef || 'main'
192+
}
126193
})
127194
128195
- name: Report push failure

.github/workflows/html-export-workflow.yml

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,20 +45,52 @@ jobs:
4545
python -m pip install --upgrade pip
4646
pip install -r requirements.txt
4747
48+
- name: Get JSON source files (from git or R2)
49+
run: |
50+
# Try to use JSON files from git first
51+
if [ -d "data/sfs_json" ] && [ -n "$(ls -A data/sfs_json 2>/dev/null)" ]; then
52+
echo "✅ Found $(find data/sfs_json -name '*.json' | wc -l) JSON files in git"
53+
echo "Using JSON files from git checkout"
54+
else
55+
echo "⚠️ No JSON files in git, downloading from Cloudflare R2..."
56+
57+
# Configure AWS CLI for R2
58+
aws configure set aws_access_key_id ${{ secrets.CLOUDFLARE_R2_ACCESS_KEY_ID }}
59+
aws configure set aws_secret_access_key ${{ secrets.CLOUDFLARE_R2_SECRET_ACCESS_KEY }}
60+
aws configure set region us-east-1
61+
aws configure set output json
62+
63+
# Download all JSON files from R2
64+
mkdir -p data/sfs_json
65+
aws s3 sync s3://${{ secrets.CLOUDFLARE_R2_BUCKET_NAME }}/sfs_json/ data/sfs_json/ \
66+
--endpoint-url https://${{ secrets.CLOUDFLARE_R2_ACCOUNT_ID }}.r2.cloudflarestorage.com \
67+
--exclude "*" \
68+
--include "*.json"
69+
70+
# Verify download
71+
if [ ! -d "data/sfs_json" ] || [ -z "$(ls -A data/sfs_json)" ]; then
72+
echo "::error::Failed to download JSON files from R2"
73+
exit 1
74+
fi
75+
echo "✅ Downloaded $(find data/sfs_json -name '*.json' | wc -l) JSON files from R2"
76+
fi
77+
env:
78+
AWS_DEFAULT_REGION: us-east-1
79+
4880
- name: Generate HTML export
4981
run: |
5082
if [ -n "${{ inputs.filter }}" ]; then
51-
python sfs_processor.py --input sfs_json --output output/html --formats html --filter "${{ inputs.filter }}"
83+
python sfs_processor.py --input data/sfs_json --output output/html --formats html --filter "${{ inputs.filter }}"
5284
else
53-
python sfs_processor.py --input sfs_json --output output/html --formats html
85+
python sfs_processor.py --input data/sfs_json --output output/html --formats html
5486
fi
5587
env:
5688
PYTHONPATH: ${{ github.workspace }}
5789

5890
- name: Regenerate index pages for HTML export
5991
run: |
60-
python exporters/html/populate_index_pages.py --input sfs_json --output index.html --limit 30
61-
python exporters/html/populate_index_pages.py --input sfs_json --output latest.html --limit 10
92+
python exporters/html/populate_index_pages.py --input data/sfs_json --output index.html --limit 30
93+
python exporters/html/populate_index_pages.py --input data/sfs_json --output latest.html --limit 10
6294
env:
6395
PYTHONPATH: ${{ github.workspace }}
6496

.github/workflows/upcoming-changes-workflow.yml

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ jobs:
3232
- name: Run upcoming changes script
3333
run: |
3434
# Kör scriptet på markdown-katalogen för att uppdatera kommande.yaml
35-
python temporal/upcoming_changes.py output/md/
35+
python temporal/upcoming_changes.py data/md-markers/
3636
env:
3737
PYTHONPATH: ${{ github.workspace }}
3838

@@ -42,7 +42,7 @@ jobs:
4242
# Detta gör att vi kan hämta ikapp missade dagar om jobbet misslyckats
4343
eval $(python temporal/get_temporal_date_range.py)
4444
echo "Bearbetar temporal commits från $FROM_DATE till $TO_DATE"
45-
python scripts/temporal_commits_batch.py output/md/ --from-date "$FROM_DATE" --to-date "$TO_DATE" --verbose
45+
python scripts/temporal_commits_batch.py data/md-markers/ --from-date "$FROM_DATE" --to-date "$TO_DATE" --verbose
4646
env:
4747
PYTHONPATH: ${{ github.workspace }}
4848
GIT_GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
@@ -54,13 +54,28 @@ jobs:
5454
5555
- name: Commit and push kommande.yaml updates
5656
run: |
57-
# Lägg till den uppdaterade kommande.yaml filen
57+
# Checkout the same fixed branch as fetch-sfs uses
58+
WORKFLOW_BRANCH="workflow-artifact-data" # Match default from fetch-sfs
59+
60+
# Fetch and checkout the fixed branch
61+
echo "📥 Checking out fixed branch '$WORKFLOW_BRANCH'..."
62+
git fetch origin "$WORKFLOW_BRANCH" 2>/dev/null || echo "Branch doesn't exist yet"
63+
64+
if git ls-remote --heads origin "$WORKFLOW_BRANCH" | grep -q "$WORKFLOW_BRANCH"; then
65+
git checkout "$WORKFLOW_BRANCH"
66+
echo "✅ Checked out existing branch '$WORKFLOW_BRANCH'"
67+
else
68+
git checkout -b "$WORKFLOW_BRANCH"
69+
echo "🆕 Created new branch '$WORKFLOW_BRANCH'"
70+
fi
71+
72+
# Make changes
5873
git add output/kommande.yaml
5974
6075
if git diff --staged --quiet; then
6176
echo "Inga ändringar i kommande.yaml"
6277
else
6378
git commit -m "Uppdatera kommande ändringar - $(date +'%Y-%m-%d')"
64-
git push
65-
echo "✅ Kommande.yaml har uppdaterats och pushats"
79+
git push origin "$WORKFLOW_BRANCH"
80+
echo "✅ Kommande.yaml uppdaterat på branch $WORKFLOW_BRANCH"
6681
fi

.gitignore

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ htmlcov/
1616
# Ignore input files
1717
sfs_docs
1818

19-
# Ignore output files
19+
# Ignore output files (generated, not committed)
2020
logs/
21-
output/
21+
output/
22+
23+
# Data directory contains source files committed to git:
24+
# - data/sfs_json/ = Käll-JSON från API
25+
# - data/md-markers/ = Markdown med selex-taggar
26+
# (nothing to ignore here, committed to git + backed up to R2)

0 commit comments

Comments
 (0)