Conversational Semantic Dynamics for Mild Cognitive Impairment Detection

Overview

Mild Cognitive Impairment (MCI) is an early stage of cognitive decline for which timely detection is important. Existing language-based approaches for MCI detection have shown promise, but most rely on static summaries of speech or text and may miss how semantic content changes over the course of a conversation. We propose a framework that captures dynamic semantic patterns from clinical conversations, including topic drift, prompt alignment, discourse coherence, and session-level semantic dispersion, and integrates them with conventional linguistic features for MCI detection. Our results show that modeling these semantic dynamics improves performance of MCI detection over the original linguistic baseline. These findings suggest that conversational semantic change may contribute to digital biomarkers for early cognitive decline.

Language Marker Extractor

To extract language markers from the transcripts, you need to extract syntactic complexity features using L2 Syntactic Complexity Analyzer. You can also use GUI from neosca GitHub Repo to extract syntactic features.

After that, put your syntactic complexity feature in file rawdata/syntactic_complexity_measures.csv and your transcripts data in folder Transcriptions, then run command python feature_extractor.py

It will generate the new8-extended feature map in rawdata/id2feature.p (107D = 99 legacy + 8 new).

New 8-Feature Extension

This updated version includes 8 newly developed linguistic features organized into 4 semantic groups:

Feature Groups

Topic Drift (2 features):
- topic_z_score: Z-score normalized topic coherence across conversation sequences
- topic_z_score_indicator: Binary indicator for significant topic drift detection
Prompt Alignment (1 feature):
- prompt_align: Semantic alignment between participant responses and interviewer prompts
Within-Session Coherence (3 features):
- coh_mean: Mean coherence within a conversation session
- coh_var: Variance of coherence within a session
- coh_min: Minimum coherence within a session
Session Semantic Dispersion (2 features):
- sess_var_mean: Mean semantic variance across sessions
- sess_var_norm: Normalized semantic variance across sessions

Ablation Study

To analyze the contribution of each new feature group, run leave-one-group-out ablation using:

python main_ablation_new8_logo.py --num_total_runs 100

The script automatically evaluates 4 settings:

leave out topic_drift
leave out prompt_alignment
leave out within_session_coherence
leave out session_semantic_dispersion

To summarize ablation results:

python summarize_alpha_ablation.py --pattern "logs/ablation_*.out"

Permutation Importance Analysis

To evaluate feature importance through permutation testing:

python main_permutation_new8.py --num_total_runs 100 --num_permute_repeats 10

This evaluates single-feature and group-level importance of the 8 new features by measuring performance drop after permutation.

The AuxiliaryExperiments folder also contains scripts for subject differentiation and confounder classification performance, both before and after temporal harmonization.

Data Helper Scripts

The Data folder includes four helper scripts for topic-based preprocessing and analysis:

Build session-topic mapping:

python Data/topic.py

Output: outputs/session_topics.csv

Build participant session text (aligned to topics):

python Data/session.py

Output: outputs/sessions_with_text.csv

Compute session-level topic similarity and z-scores:

python Data/similarity.py

Outputs:

outputs/sessions_with_z_scores.csv
outputs/mci_scores.csv
outputs/nc_scores.csv

Aggregate to subject-level features:

python Data/subject.py

Output: outputs/subject_level_features.csv

Optional plotting for step 4:

python Data/subject.py --plot

Data Request

The data is available upon request at https://www.i-conect.org/

Acknowledgement

This material is based in part upon work supported by the National Science Foundation under Grant IIS-2212174, National Institute of Aging (NIA) 1RF1AG072449, National Institute of General Medical Sciences (NIGMS) 1R01GM145700.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
AuxiliaryExperiments		AuxiliaryExperiments
Data		Data
Models		Models
Solvers		Solvers
configs		configs
figures		figures
rawdata		rawdata
tools		tools
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
LIWC2007_English.dic		LIWC2007_English.dic
README.md		README.md
environment.yml		environment.yml
feature_extractor.py		feature_extractor.py
main.py		main.py
main_ablation_new8_logo.py		main_ablation_new8_logo.py
main_permutation_new8.py		main_permutation_new8.py
performance.png		performance.png
pipeline.png		pipeline.png
summarize_alpha_ablation.py		summarize_alpha_ablation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Semantic Dynamics for Mild Cognitive Impairment Detection

Overview

Language Marker Extractor

New 8-Feature Extension

Feature Groups

Ablation Study

Permutation Importance Analysis

Data Helper Scripts

Data Request

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conversational Semantic Dynamics for Mild Cognitive Impairment Detection

Overview

Language Marker Extractor

New 8-Feature Extension

Feature Groups

Ablation Study

Permutation Importance Analysis

Data Helper Scripts

Data Request

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages