Mild Cognitive Impairment (MCI) is an early stage of cognitive decline for which timely detection is important. Existing language-based approaches for MCI detection have shown promise, but most rely on static summaries of speech or text and may miss how semantic content changes over the course of a conversation. We propose a framework that captures dynamic semantic patterns from clinical conversations, including topic drift, prompt alignment, discourse coherence, and session-level semantic dispersion, and integrates them with conventional linguistic features for MCI detection. Our results show that modeling these semantic dynamics improves performance of MCI detection over the original linguistic baseline. These findings suggest that conversational semantic change may contribute to digital biomarkers for early cognitive decline.
To extract language markers from the transcripts, you need to extract syntactic complexity features using L2 Syntactic Complexity Analyzer. You can also use GUI from neosca GitHub Repo to extract syntactic features.
After that, put your syntactic complexity feature in file rawdata/syntactic_complexity_measures.csv and your transcripts data in folder Transcriptions, then run command python feature_extractor.py
It will generate the new8-extended feature map in rawdata/id2feature.p (107D = 99 legacy + 8 new).
This updated version includes 8 newly developed linguistic features organized into 4 semantic groups:
-
Topic Drift (2 features):
topic_z_score: Z-score normalized topic coherence across conversation sequencestopic_z_score_indicator: Binary indicator for significant topic drift detection
-
Prompt Alignment (1 feature):
prompt_align: Semantic alignment between participant responses and interviewer prompts
-
Within-Session Coherence (3 features):
coh_mean: Mean coherence within a conversation sessioncoh_var: Variance of coherence within a sessioncoh_min: Minimum coherence within a session
-
Session Semantic Dispersion (2 features):
sess_var_mean: Mean semantic variance across sessionssess_var_norm: Normalized semantic variance across sessions
To analyze the contribution of each new feature group, run leave-one-group-out ablation using:
python main_ablation_new8_logo.py --num_total_runs 100The script automatically evaluates 4 settings:
- leave out
topic_drift - leave out
prompt_alignment - leave out
within_session_coherence - leave out
session_semantic_dispersion
To summarize ablation results:
python summarize_alpha_ablation.py --pattern "logs/ablation_*.out"To evaluate feature importance through permutation testing:
python main_permutation_new8.py --num_total_runs 100 --num_permute_repeats 10This evaluates single-feature and group-level importance of the 8 new features by measuring performance drop after permutation.
The AuxiliaryExperiments folder also contains scripts for subject differentiation and confounder classification performance, both before and after temporal harmonization.
The Data folder includes four helper scripts for topic-based preprocessing and analysis:
- Build session-topic mapping:
python Data/topic.pyOutput: outputs/session_topics.csv
- Build participant session text (aligned to topics):
python Data/session.pyOutput: outputs/sessions_with_text.csv
- Compute session-level topic similarity and z-scores:
python Data/similarity.pyOutputs:
outputs/sessions_with_z_scores.csvoutputs/mci_scores.csvoutputs/nc_scores.csv
- Aggregate to subject-level features:
python Data/subject.pyOutput: outputs/subject_level_features.csv
Optional plotting for step 4:
python Data/subject.py --plotThe data is available upon request at https://www.i-conect.org/
This material is based in part upon work supported by the National Science Foundation under Grant IIS-2212174, National Institute of Aging (NIA) 1RF1AG072449, National Institute of General Medical Sciences (NIGMS) 1R01GM145700.

