A three-wave in-the-wild smartphone and wearable sensing dataset for moment-level affect modeling in everyday life.
Dataset name, DOI, paper title, authors, and licenses are finalized on acceptance/public release. This repository uses the working name
EmoPhone; final names and legal terms will be substituted throughout before release.
This repository contains the documentation, metadata, preprocessing notes, and benchmark scaffolding for the D1, D2, and D3 dataset — three consecutive annual waves of an in-the-wild affective sensing study. Each wave pairs passive sensor data from participants' own Android smartphones and Fitbit wearables with dense in-situ affective state labels collected via the Experience Sampling Method (ESM).
Together, the three waves provide 53,139 ESM responses from 297 participants (328 recruited; 9.8% / 11.6% / 7.0% excluded by QC in D1 / D2 / D3), collected between 2020-02-07 and 2022-01-11 in South Korea.
The dataset is designed to support:
- Moment-level affect prediction (valence, arousal, stress, task disturbance)
- Within-person (personalized) vs. cross-person generalization benchmarks
- Cross-dataset (cross-wave) transfer-learning experiments
- Longitudinal and individual-difference analyses in mobile affective computing
Figure 1. Three-wave study protocol, sensing streams, and ESM label evolution.
| Property | D1 | D2 | D3 |
|---|---|---|---|
| Collection period | Feb 7 - Apr 2, 2020 (~30 d) | Dec 7, 2020 - Jan 27, 2021 (~30 d) | Nov 23, 2021 - Jan 11, 2022 (~28 d) |
| Recruited / retained (QC) | 102 / 92 | 112 / 99 | 114 / 106 |
| ESM responses (post-QC) | 10,259 | 21,042 | 21,838 |
| Mean responses / participant | 111.5 (SD 51.1) | 212.5 (SD 44.1) | 206.0 (SD 24.7) |
| Smartphone | Android >= 7.0 | Android >= 8.0 | Android >= 8.0 |
| Wearable | Fitbit Inspire HR + Polar H10 (sub-period) | Fitbit Inspire HR + Polar H10 (sub-period) | Fitbit Inspire HR only |
| Feature columns (total) | 8,037 | 10,122 | 10,581 |
| Shared labels | Valence, Arousal, Stress, Disturbance | Valence, Arousal, Stress, Disturbance | Valence, Arousal, Stress, Disturbance |
| Wave-specific labels | Attention, Mental, Duration, Change | Attention, Mental, Duration, ValenceChange, ArousalChange | 8 PANAS-style words (Happy, Relaxed, Cheerful, Content, Sad, Anxious, Depressed, Angry) |
Shared core labels (7-point scale, −3 to +3; Stress/Disturbance are 0 to +6 in D3 and normalised to −3/+3 in the benchmark): Valence, Arousal, Stress, Task Disturbance.
D3 affect-word labels (0 to +6): Happy, Relaxed, Cheerful, Content, Sad, Anxious, Depressed, Angry.
See data/schema.md for the full label reference and the column-naming convention.
| Setting | Scenario | Evaluation | Method families |
|---|---|---|---|
| Setting A | Personal-history predictability | 60/20/20 chronological split per user (first 30 days), concatenated across users | Baseline + tabular-NN |
| Setting B | Within-dataset cross-user transfer | Stratified group 5-fold by Pcode, evaluated per wave |
Baseline + tabular-NN + DG + DA |
| Setting C | Cross-dataset transfer across waves | Leave-one-dataset-out (1→1 and 2→1), shared labels only, common-feature intersection | Baseline + tabular-NN + DG + DA |
All settings use AUROC as the primary metric with Accuracy / Macro-F1 / Precision / Recall reported for diagnostics. Hyperparameters are tuned with Optuna (30 trials, validation-AUROC selection); training uses a unified loop (≤ 50 epochs, patience-based early stopping, fixed seed).
See benchmark/README.md for setting-level details and reproduction pointers.
Baselines: XGBoost, LightGBM, MLP, ResNet. Tabular neural networks: TabNet, SAINT, TabTransformer, FTTransformer, DCN. Domain generalization (DG): IRM, VREx, GroupDRO, MixStyle, MLDG, MASF, Fish, CSD, SagNet. Domain adaptation (DA): DANN, CDAN, DAN, DeepCORAL, MCC, ADDA, MCD, JAN, SHOT, CBST, CGDM.
DG and DA models share an MLP backbone so that family differences reflect the objective, not the backbone. DG methods follow DomainBed protocols; DA methods follow the Transfer-Learning-Library (TLL); tabular NNs follow their respective upstream repositories.
The README keeps the study-protocol figure inline and groups the remaining reviewer-facing visuals here so the main flow stays readable. All image assets live under images/ with stable, lowercase filenames.
Open reviewer figure gallery
| Dataset characteristics | Sensor and label coverage |
|---|---|
![]() |
![]() |
| ESM response timing | Cross-user embedding structure |
|---|---|
![]() |
![]() |
| Cross-wave embedding structure |
|---|
![]() |
All three waves were processed using the reproducible mobile-sensing pipeline introduced in:
Zhang, P., Jung, G., Alikhanov, J., Ahmed, U., & Lee, U. (2024). A Reproducible Stress Prediction Pipeline with Mobile Sensor Data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 8(3). https://doi.org/10.1145/3678578
Per-wave QC thresholds, feature-alignment decisions, and scale-normalisation rules are documented in preprocessing/pipeline_decisions.md and docs/feature_alignment.md.
EmoPhone/
│
├── README.md ← this file
├── DATASHEET.md ← Gebru-style datasheet (NeurIPS D&B requirement)
├── LICENSE ← code license placeholder (TBD)
├── LICENSE-DATA.md ← data license / DUA placeholder (TBD)
├── CITATION.cff ← machine-readable citation
├── AUTHORS.md ← contributors
├── RESPONSIBILITY.md ← NeurIPS D&B author responsibility statement
├── MAINTENANCE.md ← hosting, versioning, SLA
├── CHANGELOG.md ← release history
├── CONTRIBUTING.md ← PR / issue guidance
├── requirements.txt ← top-level pinned Python deps
├── environment.yml ← optional conda mirror
│
├── docs/ ← long-form docs
│ ├── dataset_overview.md
│ ├── feature_alignment.md ← cross-wave feature schema & alias map
│ ├── ethics.md
│ ├── consent_form_en.md ← translated consent form (placeholder)
│ └── neurips_db_checklist.md ← reviewer-facing compliance checklist
│
├── data/
│ ├── README.md ← Dataverse access + how to load files
│ └── schema.md ← column-by-column reference
│
├── preprocessing/
│ ├── README.md ← pipeline overview & Zhang 2024 reference
│ ├── pipeline_decisions.md ← per-wave QC, scale normalisation, alias map
│ └── feature_alignment.md ← mirrors docs/feature_alignment.md
│
├── benchmark/
│ ├── README.md ← three-setting ladder + model inventory
│ ├── setting_a/README.md ← personal-history predictability
│ ├── setting_b/README.md ← within-wave cross-user transfer
│ ├── setting_c/README.md ← cross-wave transfer
│ ├── utils/README.md ← shared loader / metric contract
│ └── results/ ← committed per-task CSV/JSON summaries
│
├── basemodel-benchmarking/ ← Setting A/B/C baseline + tabular-NN runs (code + outputs)
├── domain_adaptation/ ← Setting B/C DG + DA runs (code)
│
├── EDA/ ← dataset-characterisation notebooks
├── images/ ← figures referenced by READMEs
└── metadata/
├── croissant.json ← ML Commons Croissant metadata
└── checksums.md5 ← per-archive SHA/MD5 (populated on release)
The planned release target is Harvard Dataverse with gated access. Users will log in, agree to the Data Use Agreement (DUA), and then download. The final landing page, file IDs, DOI, license, and DUA text will be populated on acceptance/public release.
To download:
- Visit the dataset page at [TBD Harvard Dataverse URL — populated on acceptance].
- Log in or create a free Harvard Dataverse account.
- Read and agree to the Data Use Agreement.
- Download individual wave archives (
D1.zip,D2.zip,D3.zip) or the full dataset.
Download via the Dataverse API (after agreeing to terms on the website):
# D1 as an example
curl -L "https://dataverse.harvard.edu/api/access/datafile/TBD_FILE_ID_D1" \
-H "X-Dataverse-key: YOUR_API_TOKEN" \
-o D1.zipPython equivalent:
import requests
api_token = "YOUR_API_TOKEN" # Dataverse account → API Token
file_id = "TBD_FILE_ID_D1" # from the Dataverse dataset page
r = requests.get(
f"https://dataverse.harvard.edu/api/access/datafile/{file_id}",
headers={"X-Dataverse-key": api_token},
)
with open("D1.zip", "wb") as f:
f.write(r.content)See data/README.md for the full data-folder structure and loading examples.
Verify integrity after download:
md5sum -c metadata/checksums.md5- Python: 3.10 recommended; 3.9–3.11 supported.
- Install:
pip install -r requirements.txt(orconda env create -f environment.yml). - Seeds: all benchmark runs use a fixed seed (documented per setting in
benchmark/setting_*/README.md). - HPO: Optuna with 30 trials, validation-AUROC selection, setting-specific validation split.
- Training loop: maximum 50 epochs, patience-based early stopping, unified across baseline / DG / DA families.
- Split definitions, preprocessing policy, and model-selection rule are held fixed across families within each setting.
@inproceedings{emophone2026,
title = {[Dataset Title — TBD on acceptance]},
author = {[Author list — TBD on acceptance]},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track},
year = {2026},
doi = {TBD},
url = {TBD}
}
@article{zhang2024reproducible,
title = {A Reproducible Stress Prediction Pipeline with Mobile Sensor Data},
author = {Zhang, Panyu and Jung, Gyuwon and Alikhanov, Jumabek and Ahmed, Uzair and Lee, Uichin},
journal = {Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.},
volume = {8},
number = {3},
year = {2024},
doi = {10.1145/3678578}
}See CITATION.cff for the machine-readable form.
- Code license: TBD; see
LICENSE. - Data license and DUA: TBD; see
LICENSE-DATA.md.
All data collection was approved by the Institutional Review Board at [TBD institution] (approval number [TBD]). Participants provided written informed consent and received ~100 USD compensation. Anonymisation applied prior to release: MD5-hashed contact numbers, UUID-replaced Wi-Fi/Bluetooth MAC addresses, per-participant random displacement of GPS longitude. See docs/ethics.md and DATASHEET.md § 2.
The authors accept full responsibility for any rights violations arising from this release; see RESPONSIBILITY.md.
Maintenance plan and versioning policy: MAINTENANCE.md. Change history: CHANGELOG.md.
For questions about the dataset or code, contact [TBD contact email — populated on acceptance] or open a GitHub issue. Contribution guidance: CONTRIBUTING.md.





