Skip to content

Commit c33e051

Browse files
committed
Refactor LDC integration, add LDCHook provider, and include DAG structure test
- Added a new Airflow LDC provider under mokelumne/providers/ldc with provider.yaml and get_provider_info.py. - Implemented LDCHook for authenticated LDC catalog access, including session creation, refresh handling, corpora page retrieval, and download response streaming. - Refactored mokelumne/dags/fetch_ldc_corpus.py to use LDCHook for HTTP operations while keeping file writing in the DAG. - Moved corpus filtering logic into mokelumne.util.ldc.filter_corpora and updated the DAG to consume parsed metadata lists. - Added/updated unit tests for LDCHook, util LDC helpers, and DAG structure, including test_connection coverage. - Updated imports for Airflow 3 compatibility and added public/ to .gitignore.
1 parent 3222ac4 commit c33e051

15 files changed

Lines changed: 659 additions & 10639 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,5 @@ airflow.cfg
1515
*.egg-info
1616
.coverage
1717
build/
18+
public/
1819
uv.lock

mokelumne/dags/fetch_ldc_corpus.py

Lines changed: 0 additions & 248 deletions
This file was deleted.

0 commit comments

Comments
 (0)