Skip to content

Frailty migration#484

Open
theoimbert-aphp wants to merge 24 commits into
masterfrom
frailty_migration
Open

Frailty migration#484
theoimbert-aphp wants to merge 24 commits into
masterfrom
frailty_migration

Conversation

@theoimbert-aphp
Copy link
Copy Markdown
Collaborator

Summary

This PR implements NER pipelines for mentions of frailty across several domains of the Geriatric Assessment, along with some clinically validated geriatric scores for these domains.

Description

The following new NER components were added:

  • eds.autonomy
  • eds.cognition
  • eds.frailty
  • eds.general_status
  • eds.geriatric_assessment
  • eds.incontinence
  • eds.mobility
  • eds.nutrition
  • eds.pain
  • eds.polymed
  • eds.sensory
  • eds.social
  • eds.thymic
  • eds.adl
  • eds.iadl
  • eds.bref
  • eds.chair_stand
  • eds.en_eva
  • eds.g8
  • eds.gait_speed
  • eds.gds
  • eds.mini_gds
  • eds.mini_cog
  • eds.mms
  • eds.moca
  • eds.ps
  • eds.rockwood
  • eds.sppb
  • eds.tug

Those components are the result of the work done on the PASSAGE cohort and project (article yet to come).

Behavior

All these pipelines are based on ContextualMatchers, and work similarly to other NER pipelines already present in EDS-NLP. The "basic" frailty domain pipelines store their matches in doc.ents and doc.spans[{domain}] by default. The score pipelines store their matches in doc.ents, doc.spans[{score}] and doc.spans[{domain}], where {domain} is their corresponding frailty domain. For example, the eds.adl pipeline stores its matches in doc.ents, doc.spans["adl"] and doc.spans["autonomy"].

For each match, a severity attribute is set for the corresponding domain. Here is an example :

nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.normalizer())
nlp.add_pipe(eds.sentences())
nlp.add_pipe(eds.adl())

doc = nlp("ADL 6/6")

spans = doc.spans["adl"]
spans
# Out: ['ADL 6/6']

span = spans[0]
span._.autonomy
# Out: 'healthy'

Tests

Added unit tests for all these pipelines.

Documentation

Added documentation for all these pipelines. The documentation may need to be updated later on, notably when the article releases, and to provide some additional information on the scores.

Checklist

  • If this PR is a bug fix, the bug is documented in the test suite.
  • Changes were documented in the changelog (pending section).
  • If necessary, changes were made to the documentation (eg new pipeline).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 16, 2026

Docs preview URL

https://edsnlp-frailty-migration.vercel.app

@theoimbert-aphp theoimbert-aphp mentioned this pull request Mar 23, 2026
3 tasks
@percevalw percevalw force-pushed the frailty_migration branch 3 times, most recently from d44a265 to 0421e77 Compare March 27, 2026 10:00
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 27, 2026

Coverage Report

NameStmtsMiss∆ MissCover
edsnlp/pipes/ner/frailty/utils.py

New missing coverage at line 131 !

             if current_window[0] < window[0]:
-                 window[0] = current_window[0]
             if current_window[1] > window[1]:
New missing coverage at line 133 !
             if current_window[1] > window[1]:
-                 window[1] = current_window[1]
     new_regex = make_assign_regex(regex_list)

412295.12%
edsnlp/pipes/ner/frailty/scores/base.py

New missing coverage at line 76 !

         if isinstance(score_normalization, str):
-             self.score_normalization = registry.misc.get(score_normalization)
         else:
New missing coverage at line 183 !
             if assign.span_getter is not None:
-                 assigned = [
                     (matched_span, matched_span)

912297.80%
edsnlp/pipes/ner/frailty/base.py

New missing coverage at line 50 !

         if name is None:
-             name = domain
         if label is None:
New missing coverage at line 52 !
         if label is None:
-             label = domain
         self.domain = domain

392294.87%
edsnlp/pipes/ner/frailty/scores/iadl_score/factory.py

New missing coverage at line 25 !

     elif value == 0:
-         return "altered_severe"
     else:

201195.00%
edsnlp/pipes/ner/frailty/scores/chair_stand_score/factory.py

New missing coverage at line 31 !

     if kept_value is None:
-         return

361197.22%
TOTAL14453265498.17%
Files without new missing coverage
NameStmtsMiss∆ MissCover
edsnlp/utils/typing.py

Was already missing at line 44

     def __get_validators__(cls):
-         yield cls.validate

481097.92%
edsnlp/utils/torch.py

Was already missing at line 102

 def load_pruned_obj(obj, _):
-     return obj
Was already missing at line 118
     def save_align_devices_hook(pickler, obj):
-         pickler.save_reduce(load_align_devices_hook, (obj.__dict__,), obj=obj)
Was already missing at lines 121-128
     def load_align_devices_hook(state):
-         state["execution_device"] = MAP_LOCATION
  ...
-     AlignDevicesHook = None
Was already missing at line 143
             if torch.Tensor in copyreg.dispatch_table:
-                 old_dispatch[torch.Tensor] = copyreg.dispatch_table[torch.Tensor]
             copyreg.pickle(torch.Tensor, reduce_empty)

819088.89%
edsnlp/utils/span_getters.py

Was already missing at lines 73-75

     if span_getter is None:
-         yield doclike[:], None
-         return
     if callable(span_getter):
Was already missing at lines 76-78
     if callable(span_getter):
-         yield from span_getter(doclike)
-         return
     for key, span_filter in span_getter.items():
Was already missing at lines 99-102
         else:
-             for span, group in candidates:
-                 if span.label_ in span_filter:
-                     yield span, group
Was already missing at line 107
     if callable(span_setter):
-         span_setter(doc, matches)
     else:
Was already missing at line 138
     if callable(value):
-         return value
     if isinstance(value, str):
Was already missing at line 187
             elif isinstance(v, str):
-                 new_value[k] = [v]
             elif isinstance(v, list) and all(isinstance(i, str) for i in v):

24110095.85%
edsnlp/utils/resources.py

Was already missing at line 33

     if not verbs:
-         return conjugated_verbs

251096.00%
edsnlp/utils/numbers.py

Was already missing at line 34

     else:
-         string = s
     string = string.lower().strip()
Was already missing at lines 38-41
         return int(string)
-     except ValueError:
-         parsed = DIGITS_MAPPINGS.get(string, None)
-         return parsed

164075.00%
edsnlp/utils/fuzzy_alignment.py

Was already missing at line 70

         if len(other.begins) == 0:
-             return self
         begins = self.unapply(other.begins, side="left")

1911099.48%
edsnlp/utils/filter.py

Was already missing at line 206

     if isinstance(label, int):
-         return [span for span in spans if span.label == label]
     else:

741098.65%
edsnlp/utils/file_system.py

Was already missing at line 39

     if isinstance(filesystem, str):
-         filesystem = fsspec.filesystem(filesystem)
Was already missing at line 50
         if not isinstance(inferred_protocols, (list, tuple, set)):
-             inferred_protocols = [inferred_fs.protocol]
         if not isinstance(filesystem_protocols, (list, tuple, set)):
Was already missing at line 52
         if not isinstance(filesystem_protocols, (list, tuple, set)):
-             filesystem_protocols = [filesystem.protocol]
         assert set(filesystem_protocols) & set(inferred_protocols), (

313090.32%
edsnlp/training/trainer.py

Was already missing at line 59

     if result is None:
-         result = {}
     if isinstance(x, dict):
Was already missing at line 118
             # fmt: off
-             autocast = {
                "fp16": torch.float16, "float16": torch.float16,
Was already missing at lines 379-385
         if self.sub_batch_size and self.sub_batch_size[1] == "splits":
-             data = data.batchify(
  ...
-             data = data.map(lambda b: [nlp.collate(sb, device=device) for sb in b])
         elif self.sub_batch_size:
Was already missing at lines 938-945
                         raise
-                     except Exception:
  ...
-                         raise
Was already missing at lines 972-974
                     ) > grad_max_dev * math.sqrt(grad_var):
-                         spike = True
-                         spikes += 1
                     else:
Was already missing at line 981
                     if spike and grad_dev_policy == "clip_mean":
-                         torch.nn.utils.clip_grad_norm_(
                             grad_params, grad_mean, norm_type=2
Was already missing at line 985
                     elif spike and grad_dev_policy == "clip_threshold":
-                         torch.nn.utils.clip_grad_norm_(
                             grad_params,

35713096.36%
edsnlp/training/loggers.py

Was already missing at line 65

         if self._file is not None:
-             return
         os.makedirs(self.logging_dir, exist_ok=True)
Was already missing at line 102
                 if col not in values and col != "step":
-                     row.append("")
                 else:
Was already missing at line 225
     def tracker(self):
-         return self.printer
Was already missing at line 293
             )
-             logging_dir = env_logging_dir
         assert logging_dir is not None, (

1604097.50%
edsnlp/reducers.py

Was already missing at line 115

     if not hasattr(module, "__file__"):
-         return True
     if module.__file__ is None:
Was already missing at line 117
     if module.__file__ is None:
-         return False
     # Hack to avoid copying the full module dict

682097.06%
edsnlp/processing/spark.py

Was already missing at line 50

         getActiveSession = SparkSession.getActiveSession
-     except AttributeError:

471097.87%
edsnlp/processing/multiprocessing.py

Was already missing at lines 222-230

                     return re.findall(r"/[^\s]+\.so[^\s]*", f.read())
-             except Exception:
  ...
-             return []
Was already missing at lines 233-235
         loaded = loaded_libs()
-     except Exception:
-         return False
     return any(any(k in os.path.basename(p).lower() for k in libs) for p in loaded)
Was already missing at line 254
         )
-         method = "spawn"
Was already missing at lines 258-264
     if has_hdfs and method == "fork":
-         safe = "forkserver" if "forkserver" in methods else "spawn"
  ...
-         method = safe
Was already missing at lines 454-457
                 for _ in self.iter_tasks(stage=stage, stop_mode=True):
-                     pass
-             except StopSignal:
-                 pass
             for name, queue in self.consumer_queues(stage):
Was already missing at lines 672-674
             if isinstance(docs, StreamSentinel):
-                 self.active_batches[stage].append([None, None, None, docs])
-                 continue
             batch_id = str(hash(tuple(id(x) for x in docs)))[-8:] + "-" + self.uid
Was already missing at line 1144
             if self.error:
-                 raise self.error
         finally:
Was already missing at lines 1202-1208
                 if out[0].kind == requires_sentinel:
-                     missing_sentinels -= 1
  ...
-                         missing_sentinels = len(self.cpu_worker_names)
                 continue

67124096.42%
edsnlp/processing/deprecated_pipe.py

Was already missing at lines 207-209

         def converter(doc):
-             res = results_extractor(doc)
-             return (
                 [{"note_id": doc._.note_id, **row} for row in res]

532096.23%
edsnlp/pipes/trainable/span_linker/span_linker.py

Was already missing at lines 402-404

             if self.reference_mode == "synonym":
-                 embeds = embeds.to(new_lin.weight)
-                 new_lin.weight.data = embeds
             else:

1752098.86%
edsnlp/pipes/trainable/span_classifier/span_classifier.py

Was already missing at line 380

         if not all(keep_bindings):
-             logger.warning(
                 "Some attributes have no labels or values and have been removed:"

1711099.42%
edsnlp/pipes/trainable/ner_crf/ner_crf.py

Was already missing at line 302

         if self.labels is not None and not self.infer_span_setter:
-             return
Was already missing at lines 310-312
             if callable(self.target_span_getter):
-                 for span in get_spans(doc, self.target_span_getter):
-                     inferred_labels.add(span.label_)
             else:
Was already missing at line 447
             )
-             self._has_warned = True

1774097.74%
edsnlp/pipes/trainable/layers/crf.py

Was already missing at line 80

         if learnable_transitions:
-             self.transitions = torch.nn.Parameter(
                 torch.zeros_like(forbidden_transitions, dtype=torch.float)
Was already missing at line 90
         if learnable_transitions and with_start_end_transitions:
-             self.start_transitions = torch.nn.Parameter(
                 torch.zeros(num_tags, dtype=torch.float)
Was already missing at line 99
         if learnable_transitions and with_start_end_transitions:
-             self.end_transitions = torch.nn.Parameter(
                 torch.zeros(num_tags, dtype=torch.float)

1383097.83%
edsnlp/pipes/trainable/embeddings/transformer/transformer.py

Was already missing at line 167

         if quantization is not None:
-             kwargs["quantization_config"] = quantization
Was already missing at line 192
         if self.cls_token_id is None:
-             [self.cls_token_id] = self.tokenizer.convert_tokens_to_ids(
                 [self.tokenizer.special_tokens_map["bos_token"]]
Was already missing at line 196
         if self.sep_token_id is None:
-             [self.sep_token_id] = self.tokenizer.convert_tokens_to_ids(
                 [self.tokenizer.special_tokens_map["eos_token"]]

1683098.21%
edsnlp/pipes/qualifiers/reported_speech/reported_speech.py

Was already missing at lines 24-28

         return "REPORTED"
-     elif token._.rspeech is False:
-         return "DIRECT"
-     else:
-         return None

1003097.00%
edsnlp/pipes/qualifiers/negation/negation.py

Was already missing at line 28

     else:
-         return None

1011099.01%
edsnlp/pipes/qualifiers/hypothesis/hypothesis.py

Was already missing at line 27

     else:
-         return None

981098.98%
edsnlp/pipes/qualifiers/history/history.py

Was already missing at lines 26-32

 def history_getter(token: Union[Token, Span]) -> Optional[str]:
-     if token._.history is True:
-         return "ATCD"
-     elif token._.history is False:
-         return "CURRENT"
-     else:
-         return None
Was already missing at lines 351-357
                 )
-             except ValueError:
  ...
-                 note_datetime = None
Was already missing at lines 366-372
                 )
-             except ValueError:
  ...
-                 birth_datetime = None
Was already missing at lines 438-441
                         )
-                     except ValueError as e:
-                         absolute_date = None
-                         logger.warning(
                             "In doc {}, the following date {} raises this error: {}. "

18014092.22%
edsnlp/pipes/qualifiers/family/family.py

Was already missing at line 27

     else:
-         return None

831098.80%
edsnlp/pipes/qualifiers/base.py

Was already missing at line 123

             elif on_ents_only is not True:
-                 assert span_getter is None, (
                     "Cannot use both `span_getter` and `on_ents_only` as a span "

521098.08%
edsnlp/pipes/ner/tnm/model.py

Was already missing at line 143

     def __str__(self):
-         return self.norm()
Was already missing at line 167
             )
-             exclude_unset = skip_defaults

1092098.17%
edsnlp/pipes/ner/scores/sofa/sofa.py

Was already missing at line 32

             if not assigned:
-                 continue
             if assigned.get("method_max") is not None:
Was already missing at line 40
             else:
-                 method = "Non précisée"

252092.00%
edsnlp/pipes/ner/scores/elston_ellis/patterns.py

Was already missing at line 26

         if x <= 5:
-             return 1
Was already missing at lines 32-36
         else:
-             return 3
- 
-     except ValueError:
-         return None

214080.95%
edsnlp/pipes/ner/scores/charlson/patterns.py

Was already missing at lines 21-23

             return int(extracted_score)
-     except ValueError:
-         return None

132084.62%
edsnlp/pipes/ner/disorders/solid_tumor/solid_tumor.py

Was already missing at lines 131-137

         for span in spans:
-             span.label_ = "solid_tumor"
  ...
-             yield span

386084.21%
edsnlp/pipes/ner/disorders/peripheral_vascular_disease/peripheral_vascular_disease.py

Was already missing at line 108

                 if "peripheral" not in span._.assigned.keys():
-                     continue

161093.75%
edsnlp/pipes/ner/disorders/diabetes/diabetes.py

Was already missing at line 131

                 # Mostly FP
-                 continue
Was already missing at line 134
             elif self.has_far_complications(span):
-                 span._.status = 2
Was already missing at line 145
         if next(iter(self.complication_matcher(context)), None) is not None:
-             return True
         return False

303090.00%
edsnlp/pipes/ner/disorders/connective_tissue_disease/connective_tissue_disease.py

Was already missing at line 104

                 # Huge change of FP / Title section
-                 continue

151093.33%
edsnlp/pipes/ner/disorders/ckd/ckd.py

Was already missing at lines 121-124

             dfg_value = float(dfg_span.text.replace(",", ".").strip())
-         except ValueError:
-             logger.trace(f"DFG value couldn't be extracted from {dfg_span.text}")
-             return False

303090.00%
edsnlp/pipes/ner/disorders/cerebrovascular_accident/cerebrovascular_accident.py

Was already missing at lines 112-114

             if span._.source == "ischemia":
-                 if "brain" not in span._.assigned.keys():
-                     continue

182088.89%
edsnlp/pipes/ner/disorders/base.py

Was already missing at lines 119-122

             if span._.status is not None and span._.status not in all_detailed_status:
-                 default_status = 1 if 1 in all_detailed_status else None
  ...
-                 span._.status = default_status
             span._.detailed_status = self.detailed_status_mapping.get(

312093.55%
edsnlp/pipes/ner/adicap/models.py

Was already missing at line 15

     def norm(self) -> str:
-         return self.code
Was already missing at line 18
     def __str__(self):
-         return self.norm()

142085.71%
edsnlp/pipes/misc/split/split.py

Was already missing at lines 186-188

         if max_length <= 0 and self.regex is None:
-             yield doc
-             return

742097.30%
edsnlp/pipes/misc/sections/sections.py

Was already missing at line 126

         if sections is None:
-             sections = patterns.sections
         sections = dict(sections)

461097.83%
edsnlp/pipes/misc/quantities/quantities.py

Was already missing at lines 195-197

     def __getitem__(self, item: int):
-         assert isinstance(item, int)
-         return [self][item]
Was already missing at lines 209-215
     def __eq__(self, other: Any):
-         if isinstance(other, SimpleQuantity):
  ...
-         return False
Was already missing at line 218
         if other.unit == self.unit:
-             return SimpleQuantity(
                 self.value + other.value,
Was already missing at line 272
     def verify(cls, ent):
-         return True
Was already missing at line 338
     def __lt__(self, other: Union[SimpleQuantity, "RangeQuantity"]):
-         return max(self.convert_to(other.unit)) < min((part.value for part in other))
Was already missing at line 361
             return self.convert_to(other.unit) == other.value
-         return False
Was already missing at line 375
     def verify(cls, ent):
-         return True
Was already missing at line 1357
         if snippet.end != last and doclike.doc[last : snippet.end].text.strip() == "":
-             pseudo.append("w")
         pseudo = "".join(pseudo)
Was already missing at lines 1738-1742
                         ):
-                             unitless_pattern = self.unitless_patterns[
  ...
-                             unit_norm = next(
                                 scope["unit"]
Was already missing at line 1783
             ):
-                 ent = doc[min(ent_start, unit_text.start) : number.end]
             else:

70214098.01%
edsnlp/pipes/misc/dates/models.py

Was already missing at line 157

                     else:
-                         d["month"] = note_datetime.month
                 if self.day is None:
Was already missing at lines 161-167
             else:
-                 if self.year is None:
  ...
-                     d["day"] = default_day
Was already missing at lines 175-177
                 return dt
-             except ValueError:
-                 return None
Was already missing at line 193
         else:
-             return None
Was already missing at line 209
         if self.second:
-             norm += f"{self.second:02}s"

20311094.58%
edsnlp/pipes/misc/dates/dates.py

Was already missing at line 249

         if isinstance(absolute, str):
-             absolute = [absolute]
         if isinstance(relative, str):
Was already missing at line 251
         if isinstance(relative, str):
-             relative = [relative]
         if isinstance(duration, str):
Was already missing at line 253
         if isinstance(duration, str):
-             relative = [duration]
         if isinstance(false_positive, str):
Was already missing at lines 357-366
             if self.merge_mode == "align":
-                 alignments = align_spans(matches, spans, sort_by_overlap=True)
  ...
-                         matches.append(span)
Was already missing at lines 462-464
                 if v1.mode == Mode.DURATION:
-                     m1 = Bound.FROM if v2.bound == Bound.UNTIL else Bound.UNTIL
-                     m2 = v2.mode or Bound.FROM
                 elif v2.mode == Mode.DURATION:

15314090.85%
edsnlp/pipes/misc/consultation_dates/consultation_dates.py

Was already missing at line 131

         else:
-             self.date_matcher = None
Was already missing at line 134
         if not consultation_mention:
-             consultation_mention = []
         elif consultation_mention is True:

482095.83%
edsnlp/pipes/llm/llm_span_qualifier/llm_span_qualifier.py

Was already missing at line 579

         if isinstance(formatted_context, Doc):
-             context_text = formatted_context.text
         else:
Was already missing at line 657
             if start == -1 or end <= 0 or end <= start:
-                 return None
             try:
Was already missing at line 750
             if next_yield >= len(doc_states):
-                 return
             for state in doc_states[next_yield:]:

2493098.80%
edsnlp/pipes/llm/llm_markup_extractor/llm_markup_extractor.py

Was already missing at line 309

         if seed is not None:
-             api_kwargs["seed"] = seed
         self.retriever = None
Was already missing at line 355
             if span is None:
-                 continue
             spans.append(span)
Was already missing at lines 467-469
                 if not contexts:
-                     remaining_ctx_counts[doc_idx] = 0
-                     buffer[doc_idx] = doc
                 else:
Was already missing at line 490
             if result is None:
-                 pass
             else:

1575096.82%
edsnlp/pipes/core/normalizer/__init__.py

Was already missing at line 7

 def excluded_or_space_getter(t):
-     return t.is_space or t.tag_ == "EXCLUDED"

51080.00%
edsnlp/pipes/core/endlines/endlines.py

Was already missing at lines 160-164

         if end_lines_model is None:
-             path = build_path(__file__, "base_model.pkl")
- 
-             with open(path, "rb") as inp:
-                 self.model = pickle.load(inp)
         elif isinstance(end_lines_model, str):
Was already missing at lines 167-169
                 self.model = pickle.load(inp)
-         elif isinstance(end_lines_model, EndLinesModel):
-             self.model = end_lines_model
         else:
Was already missing at line 200
         ):
-             return "ENUMERATION"
Was already missing at line 287
         if np.isnan(sigma):
-             sigma = 1

897092.13%
edsnlp/patch_spacy.py

Was already missing at lines 67-69

             # if module is reloaded.
-             existing_func = registry.factories.get(internal_name)
-             if not util.is_same_func(factory_func, existing_func):
                 raise ValueError(

312093.55%
edsnlp/package.py

Was already missing at lines 474-476

             version = version or pyproject["project"]["version"]
-         except (KeyError, TypeError):
-             version = "0.1.0"
         name = name or pyproject["project"]["name"]
Was already missing at line 480
         else:
-             main_package = None
         model_package = snake_case(name.lower())

2143098.60%
edsnlp/metrics/span_attribute.py

Was already missing at lines 68-70

         )
-         assert attributes is None
-         attributes = kwargs.pop("qualifiers")
     if attributes is None:

932097.85%
edsnlp/matchers/simstring.py

Was already missing at line 280

     if custom:
-         attr = attr[1:].lower()
Was already missing at line 295
             if custom:
-                 token_text = getattr(token._, attr)
             else:

1462098.63%
edsnlp/language.py

Was already missing at line 103

             if last != begin:
-                 logger.warning(
                     "Missed some characters during"

521098.08%
edsnlp/data/standoff.py

Was already missing at line 38

     def __init__(self, ann_file, line):
-         super().__init__(f"File {ann_file}, unrecognized Brat line {line}")
Was already missing at line 192
                         )
-                 except Exception:
                     raise Exception(

1862098.92%
edsnlp/data/polars.py

Was already missing at line 36

         if hasattr(data, "collect"):
-             data = data.collect()
         assert isinstance(data, pl.DataFrame)

551098.18%
edsnlp/data/json.py

Was already missing at line 81

                 return records
-         except Exception as e:
             raise Exception(f"Cannot read {file}: {e}")

1121099.11%
edsnlp/data/huggingface_dataset.py

Was already missing at line 259

         if isinstance(item, DatasetEndSentinel):
-             continue
         else:

991098.99%
edsnlp/data/converters.py

Was already missing at line 427

                 elif key == "XPOS":
-                     word.tag_ = value
                 elif key == "FEATS":
Was already missing at line 835
         if self.keep_raw_attribute_values:
-             return value
         try:
Was already missing at line 897
                 if not attr:
-                     continue
                 if "=" in attr:
Was already missing at line 928
             if span is None:
-                 continue
             for k, v in attrs.items():
Was already missing at line 998
         if isinstance(value, (bool, int, float)):
-             return repr(value)
         s = str(value)
Was already missing at line 1307
                 if current_type is not None:
-                     entities.append((start_idx, i, current_type))
                 start_idx = i
Was already missing at line 1401
             if start < 0 or start >= len(tags):
-                 continue
             tags[start] = f"B-{label}"
Was already missing at line 1445
     if isinstance(converter, type):
-         return converter(**kwargs), {}
     return converter, validate_kwargs(converter, kwargs)

5008098.40%
edsnlp/data/conll.py

Was already missing at lines 81-83

             )
-         except StopIteration:
-             cols = DEFAULT_COLUMNS
             warnings.warn(
Was already missing at lines 92-96
         if not line:
-             if doc["words"]:
-                 yield doc
-                 doc = {"words": []}
-             continue
         if line.startswith("#"):

766092.11%
edsnlp/core/torch_component.py

Was already missing at line 407

             if hasattr(self, "compiled"):
-                 res = self.compiled(batch)
             else:
Was already missing at line 453
         """
-         return self.preprocess(doc)

1902098.95%
edsnlp/core/stream.py

Was already missing at line 155

             else:
-                 yield res
             return
Was already missing at lines 203-205
                 if isinstance(batch, StreamSentinel):
-                     yield batch
-                     continue
                 results = []
Was already missing at lines 1030-1032
                 elif op.batch_fn is None:
-                     batch_size = op.size
-                     batch_fn = batchify
                 else:

3835098.69%
edsnlp/core/registries.py

Was already missing at line 138

         if isinstance(obj, DraftPipe):
-             return obj
         elif isinstance(obj, dict):
Was already missing at line 143
                 if result is not None:
-                     return result
         elif isinstance(obj, (tuple, list, set)):
Was already missing at line 148
                 if result is not None:
-                     return result
         return None

2213098.64%
edsnlp/core/pipeline.py

Was already missing at line 607

             if name in exclude:
-                 continue
             if name not in components:
Was already missing at lines 715-718
         """
-         res = Stream.ensure_stream(docs)
-         res = res.map(functools.partial(self.preprocess, supervision=supervision))
-         return res

4634099.14%
edsnlp/connectors/omop.py

Was already missing at line 69

         if not isinstance(row.ents, list):
-             continue
Was already missing at line 87
             else:
-                 doc.spans[span.label_].append(span)
Was already missing at line 127
     if df.note_id.isna().any():
-         df["note_id"] = range(len(df))
Was already missing at line 171
         if i > 0:
-             df.term_modifiers += ";"
         df.term_modifiers += ext + "=" + df[ext].astype(str)

844095.24%
edsnlp/_version.py

Was already missing at line 21

     if repo_root is None:
-         return base_version
Was already missing at line 39
-     return (
         base_version

152086.67%
edsnlp/tune.py

Was already missing at line 221

         return logger_config.get("@loggers") == "json"
-     return False
Was already missing at line 291
                 continue
-             raise
         for feature, importance in importance_scores.items():
Was already missing at line 399
             ):
-                 resolved_key = int(key)
             try:
Was already missing at lines 650-652
             return os.path.join(info.root_dir, metrics_relpath)
-         time.sleep(1)
-     return None
Was already missing at line 834
             if was_pruned:
-                 _handle_pruned_dvc_runs(queue, entries, entry)
             if not (result and result.exp_hash and result.ref_info):
Was already missing at line 1062
         else:
-             config = copy.deepcopy(raw_config)
         updated_config = update_config(
Was already missing at line 1197
     else:
-         config_path_phase_2 = os.path.join(output_dir_phase_1, "config.cfg")

6268-298.72%

380 files skipped due to complete coverage.

Coverage success: total of 98.17% is above 98.07% 🎉

@percevalw
Copy link
Copy Markdown
Member

Hi @theoimbert-aphp, congratulations on this massive work!

  1. Names:

Some component names are a bit short and ambiguous (e.g., eds.ps). I suggest making them more explicit, e.g., eds.ecog_performance_status_score instead of eds.ps).

For example:

  • eds.autonomy -> eds.autonomy_status
  • eds.cognition -> eds.cognitive_status
  • eds.frailty -> eds.frailty
  • eds.general_status -> eds.global_health_status
  • eds.geriatric_assessment -> eds.geriatric_assessment
  • eds.incontinence -> eds.incontinence_status
  • eds.mobility -> eds.mobility_status
  • eds.nutrition -> eds.nutritional_status
  • eds.pain -> eds.pain_status
  • eds.polymed -> eds.polypharmacy_status
  • eds.sensory -> eds.sensory_status
  • eds.social -> eds.social_status
  • eds.thymic -> eds.mood_status
  • eds.adl -> eds.adl_score
  • eds.bref -> eds.bref_score
  • eds.chair_stand -> eds.chair_stand_score
  • eds.en_eva -> eds.pain_rating_score
  • eds.g8_score -> eds.g8_score
  • eds.gait_speed -> eds.gait_speed_score
  • eds.gds -> eds.geriatric_depression_scale_score
  • eds.iadl -> eds.iadl_score
  • eds.mini_cog -> eds.mini_cog_score
  • eds.mini_gds -> eds.mini_gds_score
  • eds.mms -> eds.mini_mental_state_score
  • eds.moca -> eds.moca_score
  • eds.ps -> eds.ecog_performance_status_score
  • eds.rockwood -> eds.clinical_frailty_scale_score
  • eds.sppb -> eds.sppb_score
  • eds.tug -> eds.timed_up_and_go_score

What do you think about this ? we can discuss this mapping here and I can handle the refactoring if you want to.

  1. Coverage

Some pipes have missing coverage, which can be quite easily improved by adding specific tests : could you handle that ?

To use the rebased PR, run the following commands:

⚠️ This will erase any local changes made to the branch

git fetch
git reset origin/frailty_migration --hard

@theoimbert-aphp
Copy link
Copy Markdown
Collaborator Author

theoimbert-aphp commented Mar 27, 2026

Hi @percevalw, thanks for the feedback !

  1. Regarding names I understand and I think it's a good idea, but I have a few remarks:
  • I used autonomy as a short-hand way to talk about what geriatricians call "functional status", so I think since we are renaming things it would be best to rename eds.autonomy to eds.functional_status
  • I think a better name for eds.thymic would be eds.psychological_status, as it fits better to the geriatricians' jargon than eds.mood_status
  • I'm a bit skeptical about leaving eds.frailty as is if everything else is renamed, as I'm afraid it could be misleading and implying this is a "one fits all" magical component that would extract everything related to frailty, whereas in reality it only captures explicit mentions of frailty (e.g : "Le patient est fragile") and misses out on basically everything else that is covered by the other components. Perhaps something along the lines of eds.frailty_mentions ? Let me know what you think

As for the rest, your propositions seem great to me !

  1. Regarding coverage, sure, I can look into that !

@percevalw
Copy link
Copy Markdown
Member

I'm ok with all your comments. Regarding the pipe names, can I let you edit and suggest a final mapping that would go in the same direction of being more explicit ?

@theoimbert-aphp
Copy link
Copy Markdown
Collaborator Author

Sure, the final mapping would be like so :

  • eds.autonomy -> eds.functional_status
  • eds.cognition -> eds.cognitive_status
  • eds.frailty -> eds.frailty_mentions
  • eds.general_status -> eds.global_health_status
  • eds.geriatric_assessment -> eds.geriatric_assessment
  • eds.incontinence -> eds.incontinence_status
  • eds.mobility -> eds.mobility_status
  • eds.nutrition -> eds.nutritional_status
  • eds.pain -> eds.pain_status
  • eds.polymed -> eds.polypharmacy_status
  • eds.sensory -> eds.sensory_status
  • eds.social -> eds.social_status
  • eds.thymic -> eds.psychological_status
  • eds.adl -> eds.adl_score
  • eds.bref -> eds.bref_score
  • eds.chair_stand -> eds.chair_stand_score
  • eds.en_eva -> eds.pain_rating_score
  • eds.g8_score -> eds.g8_score
  • eds.gait_speed -> eds.gait_speed_score
  • eds.gds -> eds.geriatric_depression_scale_score
  • eds.iadl -> eds.iadl_score
  • eds.mini_cog -> eds.mini_cog_score
  • eds.mini_gds -> eds.mini_gds_score
  • eds.mms -> eds.mini_mental_state_score
  • eds.moca -> eds.moca_score
  • eds.ps -> eds.ecog_performance_status_score
  • eds.rockwood -> eds.clinical_frailty_scale_score
  • eds.sppb -> eds.sppb_score
  • eds.tug -> eds.timed_up_and_go_score

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 2, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants