π§ Repository under construction β core ingestion pipeline is functional; analysis modules, views, and query helpers are coming next.
DigiMuh consolidates ~8.9 GB of heterogeneous dairy-cow CSV sensor data into a single normalised SQLite database. The data spans 3.5 years (April 2021 β September 2024) of continuous monitoring from multiple on-farm systems:
| System | What it measures |
|---|---|
| smaXtec bolus | Rumen temperature, pH, activity, motility, rumination, water intake, estrus/calving indices |
| smaXtec barn sensors | Barn temperature, humidity, THI |
| HerdePlus | Milking events, MLP test-day results, calving/lactation records |
| HerdePlus diseases | Health events and diagnoses |
| Gouna | Respiration frequency |
| BCS | Body condition scores |
| LoRaWAN | Environmental sensor battery/current |
| HOBO | Weather station (temperature, humidity, solar radiation, wind, wetness) |
| DWD | German Weather Service THI and enthalpy |
The database uses a star schema: four dimension tables (animals, sensors,
barns, source_files) and twelve fact tables, connected by integer foreign
keys. Every row carries a file_id for full provenance tracing back to the
original CSV.
See docs/database_structure.md for the full
schema and docs/column_dictionary.md for a
description of every column.
# Clone the repository
git clone https://github.com/zerotonin/digimuh.git
cd digimuh
# Option A: conda (recommended)
conda env create -f environment.yml
conda activate digimuh
# Option B: pip
# reRandomStats is not on PyPI β install it from the v0.2.0 git tag first,
# then the editable install picks it up locally to satisfy the dependency.
pip install "git+https://github.com/zerotonin/reRandomStats.git@v0.2.0"
pip install -e ".[dev]"# 1. Smoke test with 5 files per folder (~1 min)
digimuh-ingest /path/to/DigiMuh-Export --db cow_test.db --test-n 5
# 2. Full ingestion (~2β3 hours)
rm cow_test.db
digimuh-ingest /path/to/DigiMuh-Export --db cow.db
# 3. Query the database
python -c "
import sqlite3
con = sqlite3.connect('cow.db')
cur = con.execute('SELECT COUNT(*) FROM smaxtec_derived')
print(f'smaxtec_derived rows: {cur.fetchone()[0]:,}')
"The ingestion script expects the DigiMuh CSV export directory to have this structure:
DigiMuh-Export_2021-04-01_2024-09-30/
βββ output_allocations/
β βββ allocations.csv
βββ outputs_bcs/
β βββ {animal_id}_bcs_{date_range}.csv Γ715
βββ outputs_gouna/
β βββ {animal_id}_gouna_{date_range}.csv Γ91
βββ outputs_herdeplus_mlp_gemelk_kalbung/
β βββ {animal_id}_herdeplus_{date_range}.csv Γ965
βββ outputs_hobo/
β βββ hobo_exports_{date_range}.csv
βββ outputs_lorawan/
β βββ {sensor_name}_LoRaWAN_raw_{date_range}.csv Γ22
βββ outputs_smaxtec_barns/
β βββ {barn_name}_smaxtec_raw_{date_range}.csv Γ4
βββ outputs_smaxtec_derived/
β βββ {animal_id}_smaxtec_derived_{date_range}.csv Γ837
βββ outputs_smaxtec_events/
β βββ {animal_id}_events.csv Γ837
βββ outputs_smaxtec_water_intake/
β βββ {animal_id}_smaxtec_derived_{date_range}.csv Γ837
βββ herdeplus_diseases.csv
βββ outputs_dwd.csv
Animal IDs are 15-digit EU ear tag numbers. The entity identifier is always the first underscore-delimited segment of each filename.
digimuh-ingest [-h] [--db DB] [--chunk-size N] [--verbose] [--test-n N] root_dir
| Argument | Description |
|---|---|
root_dir |
Root directory containing all CSV folders |
--db |
Output SQLite path (default: cow.db) |
--chunk-size |
Rows per INSERT batch (default: 50 000) |
--test-n N |
Only ingest first N files per folder |
--verbose, -v |
Print CREATE TABLE SQL and debug info |
python -m pytestAfter ingestion, five analysis scripts are available as CLI commands. Each creates analysis views on first run, queries the database, and writes results (CSV data + figures) to an output directory.
# Install with analysis dependencies
pip install -e ".[analysis]"
# 0. Individual heat stress thresholds (broken-stick regression)
digimuh-broken-stick --db cow.db --tierauswahl Tierauswahl.xlsx --out results/broken_stick
# 1. Subclinical ketosis risk β FPR Γ rumination Γ milk yield
digimuh-ketosis --db cow.db --out results/ketosis
# 2. Heat stress β rumen temp Γ THI Γ water Γ respiration
digimuh-heat --db cow.db --out results/heat
# 3. Digestive efficiency β motility Γ pH β milk composition (time-lagged)
digimuh-digestive --db cow.db --out results/digestive
# 4. Circadian disruption β 24h Fourier decomposition as welfare marker
digimuh-circadian --db cow.db --out results/circadian
# 5. Motility entropy β rumen HRV analogue via information theory
digimuh-entropy --db cow.db --out results/entropyEach script writes:
- A CSV of the extracted features (for further analysis in R, Python, etc.)
- Publication-ready SVG + PNG figures
- A JSON summary of key results (where applicable)
See docs/database_structure.md for the SQL view
definitions that power these analyses.
- CSV β SQLite ingestion with star schema
- SQL views for analysis (daily summaries + cross-table joins)
- Analysis: individual heat stress thresholds (broken-stick regression)
- Analysis: subclinical ketosis detection (FPR + RF classifier)
- Analysis: heat stress multi-sensor fusion
- Analysis: digestive efficiency (motilityβpH coupling)
- Analysis: circadian rhythm disruption index
- Analysis: motility pattern entropy (novel)
- Data validation and quality-check reports
- Parallelised entropy computation for full dataset
- Sphinx documentation on GitHub Pages
Bart R. H. Geurten β Department of Zoology, University of Otago
If you use DigiMuh in your research, please cite both DigiMuh and the
reRandomStats statistics toolkit it consumes for breakpoint detection
(broken-stick regression, Davies / Pseudo-Score tests, 4-parameter
Hill fit) and FDR correction. The version you used should match the
DOI you cite; full metadata for DigiMuh is in CITATION.cff
and on the GitHub repo's Cite this repository button.
Geurten, B. R. H. (2026). DigiMuh: Dairy-cow sensor data ingestion and heat-stress analysis pipeline (Version 1.0.0) [Software]. Zenodo. https://doi.org/10.5281/zenodo.20389795
@software{geurten_digimuh_v100,
author = {Geurten, Bart R. H.},
title = {{DigiMuh}: Dairy-cow sensor data ingestion and heat-stress
analysis pipeline},
year = {2026},
version = {1.0.0},
doi = {10.5281/zenodo.20389795},
url = {https://github.com/zerotonin/digimuh},
license = {MIT},
}
@software{geurten_rerandomstats_v020,
author = {Geurten, Bart R. H.},
title = {{reRandomStats}: Re-randomisation Statistics Toolkit},
year = {2026},
version = {0.2.0},
doi = {10.5281/zenodo.20387255},
url = {https://github.com/zerotonin/reRandomStats},
license = {MIT},
}Note for Elsevier submissions: Elsevier Editorial Manager does not parse
@software; convert to@miscat submission time per the lab BibTeX convention.
MIT