add DepthScore by Sohaib-Ahmed21 · Pull Request #3318 · Lightning-AI/torchmetrics

Sohaib-Ahmed21 · 2026-01-25T20:53:10Z

What does this PR do?

Adds the DepthScore metric for evaluating text generation. The implementation follows the BERTScore architecture and logic closely, as both metrics:

Extract contextual embeddings from transformer models
Compare sentence representations using token-level embeddings
Support custom models, multiple references, and various configuration options

The key difference is that DepthScore measures the distance between embedding distributions using depth-based statistical methods (integrated rank-weighted depth, Wasserstein distance, MMD, etc.) instead of token-level cosine similarity.

Fixes #854

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
[ x Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

📚 Documentation preview 📚: https://torchmetrics--3318.org.readthedocs.build/en/3318/

…all depth measures

…est cases through utils properly

…s missing, fixes ddp Skipped exception failures

…ohaib-Ahmed21/torchmetrics into feature/854_depth_score_metric

Sohaib-Ahmed21 · 2026-01-28T03:36:27Z

@bhimrazy @justusschock this PR is ready for review. Kindly approve the workflows and review it, thanks!

codecov · 2026-01-28T10:17:04Z

Codecov Report

❌ Patch coverage is 19.35484% with 300 lines in your changes missing coverage. Please review.
✅ Project coverage is 36%. Comparing base (bfcc276) to head (d5d520a).

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #3318    +/-   ##
=======================================
- Coverage      37%     36%    -1%     
=======================================
  Files         364     351    -13     
  Lines       20098   20273   +175     
=======================================
- Hits         7520    7396   -124     
- Misses      12578   12877   +299

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sohaib-Ahmed21 · 2026-01-30T03:32:06Z

@bhimrazy @rittik9 Please approve the workflows.

rittik9 · 2026-01-31T07:41:19Z

Thank you @Sohaib-Ahmed21 for this pr. Will take a look...

Sohaib-Ahmed21 · 2026-02-03T22:33:31Z

@rittik9 @bhimrazy all required checks are passing. Please review the PR, thanks!

justusschock

why a are we doing all the operations here in numpy rather than pytorch? it should be easier if we don't need to do conversions, no?

justusschock · 2026-02-25T17:33:57Z

+    return torch.stack(out, dim=0)
+
+
+def cov_matrix(x: np.ndarray, robust: bool = False) -> np.ndarray:


what's the issue with torch.cov?

sklearn is a heavy requirement just for this.

justusschock · 2026-02-25T17:37:04Z

+            containing `"input_ids"` and `"attention_mask"`.
+        target: Reference sentence(s) as `str`, `Sequence[str]`, multi-reference
+            `Sequence[Sequence[str]]`, or tokenized dict containing `"input_ids"` and `"attention_mask"`.
+        model_name_or_path: Hugging Face model name/path used when `model` is not provided.


can we unify this with model? if model is a string, we use that as name or path and if it's a callable we'll use it straight away?

Sohaib-Ahmed21 · 2026-03-17T09:42:51Z

Thanks for the review @justusschock. Will address this as soon as possible.

Sohaib-Ahmed21 added 3 commits January 25, 2026 10:17

Add depth score metric end to end

2c1b0bc

Fix pre-commit failures and add reduction param to main interface

c8be46a

Rename measure param to depth_measure and provided test coverage for …

b57be3f

…all depth measures

Sohaib-Ahmed21 requested review from SkafteNicki, justusschock and lantiga as code owners January 25, 2026 20:53

github-actions Bot added documentation Improvements or additions to documentation topic: Text labels Jan 25, 2026

Sohaib-Ahmed21 and others added 9 commits January 25, 2026 13:07

Add reference to original implementation properly

0b7a8f5

Add new depthscore specific dependencies to text dependencies in repo

7e0929c

Fix typo

e514181

Handle depth score specific dependencies, their imports and related t…

cd1a94f

…est cases through utils properly

Fix RST formatting error

52a3d3e

Change POT version to >=0.9.0 to fix Cython dependency issue in CI

8ef1573

Merge branch 'master' into feature/854_depth_score_metric

9d528ff

Proper test skipping for DepthScore when nlg_eval_via_simi_measures i…

0454a48

…s missing, fixes ddp Skipped exception failures

Merge branch 'feature/854_depth_score_metric' of https://github.com/S…

b5bfe9b

…ohaib-Ahmed21/torchmetrics into feature/854_depth_score_metric

Sohaib-Ahmed21 and others added 2 commits January 29, 2026 07:57

Fix code quality check failures

dc90a55

Merge branch 'master' into feature/854_depth_score_metric

3388d88

Merge branch 'master' into feature/854_depth_score_metric

d5d520a

justusschock reviewed Feb 25, 2026

View reviewed changes

Sohaib-Ahmed21 added 2 commits March 8, 2026 01:26

Merge branch 'master' into feature/854_depth_score_metric

a52c238

Merge branch 'master' into feature/854_depth_score_metric

9f6b059

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add DepthScore#3318

add DepthScore#3318
Sohaib-Ahmed21 wants to merge 17 commits intoLightning-AI:masterfrom
Sohaib-Ahmed21:feature/854_depth_score_metric

Sohaib-Ahmed21 commented Jan 25, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Sohaib-Ahmed21 commented Jan 28, 2026

Uh oh!

codecov Bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Sohaib-Ahmed21 commented Jan 30, 2026 •

edited

Loading

Uh oh!

rittik9 commented Jan 31, 2026

Uh oh!

Sohaib-Ahmed21 commented Feb 3, 2026 •

edited

Loading

Uh oh!

justusschock left a comment

Uh oh!

justusschock Feb 25, 2026

Uh oh!

justusschock Feb 25, 2026

Uh oh!

Sohaib-Ahmed21 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return torch.stack(out, dim=0)


		def cov_matrix(x: np.ndarray, robust: bool = False) -> np.ndarray:

Conversation

Sohaib-Ahmed21 commented Jan 25, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Did you have fun?

Uh oh!

Sohaib-Ahmed21 commented Jan 28, 2026

Uh oh!

codecov Bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Sohaib-Ahmed21 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rittik9 commented Jan 31, 2026

Uh oh!

Sohaib-Ahmed21 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justusschock left a comment

Choose a reason for hiding this comment

Uh oh!

justusschock Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

justusschock Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Sohaib-Ahmed21 commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sohaib-Ahmed21 commented Jan 25, 2026 •

edited by github-actions Bot

Loading

codecov Bot commented Jan 28, 2026 •

edited

Loading

Sohaib-Ahmed21 commented Jan 30, 2026 •

edited

Loading

Sohaib-Ahmed21 commented Feb 3, 2026 •

edited

Loading