Commit c33e051
committed
Refactor LDC integration, add LDCHook provider, and include DAG structure test
- Added a new Airflow LDC provider under mokelumne/providers/ldc with provider.yaml and get_provider_info.py.
- Implemented LDCHook for authenticated LDC catalog access, including session creation, refresh handling, corpora page retrieval, and download response streaming.
- Refactored mokelumne/dags/fetch_ldc_corpus.py to use LDCHook for HTTP operations while keeping file writing in the DAG.
- Moved corpus filtering logic into mokelumne.util.ldc.filter_corpora and updated the DAG to consume parsed metadata lists.
- Added/updated unit tests for LDCHook, util LDC helpers, and DAG structure, including test_connection coverage.
- Updated imports for Airflow 3 compatibility and added public/ to .gitignore.1 parent 3222ac4 commit c33e051
15 files changed
Lines changed: 659 additions & 10639 deletions
File tree
- mokelumne
- dags
- providers/ldc
- hooks
- util
- test
- fixtures
- tests
- unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
This file was deleted.
0 commit comments